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AUTHORS’ PREFACE 


The present course is based on lectures given by I. M. 
Gelfand in the Mechanics and Mathematics Department 
of Moscow State University. However, the book goes 
considerably beyond the material actually presented in 
the lectures. Our aim is to give a treatment of the ele- 
ments of the calculus of variations in a form which is 
both easily understandable and sufficiently modern. 
Considerable attention is devoted to physical applica- 
tions of variational methods, e.g., canonical equations, 
variational principles of mechanics and conservation laws. 


The reader who merely wishes to become familiar with 
the most basic concepts and methods of the calculus of 
variations need only study the first chapter. The first 
three chapters, taken together, form a more compre- 
hensive course on the elements of the calculus of varia- 
tions,-but one which is still quite elementary (involving 
only necessary conditions for extrema). The first six 
chapters contain, more or less, the material given in 
the usual university course in the calculus of variations 
(with applications to the mechanics of systems with a 
finite number of degrees of freedom), including the 
theory of fields (presented in a somewhat novel way) 
and sufficient conditions for weak and strong extrema. 
Chapter 7 is devoted to the application of variational 
methods to the study of systems with infinitely many 
degrees of freedom. Chapter 8 contains a brief treat- 
ment of direct methods in the calculus of variations. 


The authors are grateful to M. A. Yevgrafov and A. G. 
Kostyuchenko, who read the book in manuscript and 
made many useful comments. 
I. M.G. 
S. V. F. 


TRANSLATOR’S PREFACE 


This book is a modern introduction to the calculus of 
variations and certain of its ramifications, and I trust 
that its fresh and lively point of view will serve to make 
it a welcome aldition to the English-language literature 
on the subject. The present edition is rather different 
from the Russian original. With the authors’ consent, 
I have given free rein to the tendency of any mathe- 
matically educated translator to assume the functions 
of annotator and stylist. In so doing, I have had two 
special assets: 1) A substantial list of revisions and 
corrections from Professor S. V. Fomin himself, and 
2) A variety of helpful suggestions from Professor J. T. 
Schwartz of New York University, who read the entire 
translation in typescript. 


The problems appearing at the end of each of the eight 
chapters and two appendices were made specifically for 
the English edition, and many of them comment further 
on the corresponding parts of the text. A variety of 
Russian sources have played an important role in the 
synthesis of this material. In particular, I have consulted 
the textbooks on the calculus of variations by N. I. 
Akhiezer, by L. E. Elsgolts, and by M. A. Lavrentev 
and L. A. Lyusternik, as well as Volume 2 of the well- 
known problem collection by N. M. Gyunter and R. O. 
Kuzmin, and Chapter 3 of G. E. Shilov’s “Mathematical 
Analysis, A Special Course.” 


At the end of the book I have added a Bibliography 
containing suggestions for collateral and supplementary 
reading. This list is not intended as an exhaustive cata- 
log of the literature, and is in fact confined to books 
available in English. 

R.A.S. 
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ELEMENTS 
OF THE THEORY 


I. Functionals. Some Simple Variational Problems 


Variable quantities called functionals play an important role in many 
problems arising in analysis, mechanics, geometry, etc. By a functional, we 
mean a correspondence which assigns a definite (real) number to each function 
(or curve) belonging to some class. Thus, one might say that a functional is 
a kind of function, where the independent variable is itself a function (or 
curve). The following are examples of functionals: 


1. Consider the set of all rectifiable plane curves.! A definite number is 
associated with each such curve, namely, its length. Thus, the length 
of a curve is a functional defined on the set of rectifiable curves. 


2. Suppose that each rectifiable plane curve is regarded as being made 
out of some homogeneous material. Then if we associate with each 
such curve the ordinate of its center of mass, we again obtain a 
functional. 


3. Consider all possible paths joining two given points A and B in the 
plane. Suppose that a particle can move along any of these paths, 
and let the particle have a definite velocity v(x, y) at the point (x, y). 
Then we obtain a functional by associating with each path the time the 
particle takes to traverse the path. 


1JIn analysis, the length of a curve is defined as the limiting length of a polygonal line 
inscribed in the curve (i.e., with vertices lying on the curve) as the maximum length of 
the chords forming the polygonal line goes to zero. If this limit exists and is finite, the 
curve is said to be rectifiable. 
I 
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4. Let y(x) be an arbitrary continuously differentiable function, defined 
on the interval [a, b].2 Then the formula 


b t 
Jty] = | ¥@) dx 
defines a functional on the set of all such functions y(x). 


5. As a more general example, let F(x, y, z) be a continuous function of 
three variables. Then the expression 


Jb] = [° Fle 0), yO) as () 


where (x) ranges over the set of all continuously differentiable functions 
defined on the interval [a,b], defines a functional. By choosing 
different functions F(x, y,z), we obtain different functionals. For 
exampk, if 
F(x, yz) = V1 + 23, 
J[y] is the length of the curve y = )(x), as in the first example, while if 
F(x, y, z) = z*, 


J{y] reduces to the case considered in the fourth example. In what 
follows, we shall be concerned mainly with functionals of the form (1). 


Particular instances of problems involving the concept of a functional 
were considered more than three hundred years ago, and in fact, the first 
important results in this area are due to Euler (1707-1783). Nevertheless, 
up to now, the “calculus of functionals” still does not have methods of a 
generality comparable to the methods of classical analysis (i.e., the ordinary 
“calculus of functions”). The most developed branch of the “‘calculus of 
functionals” is concerned with finding the maxima and minima of functionals, 
and is called the ‘“‘calculus of variations.” Actually, it would be more 
appropriate to call this subject the “calculus of variations in the narrow 
sense,” since the significance of the concept of the variation of a functional 
is by no means confined to its applications to the problem of determining the 
extrema of functionals. 

We now indicate some typical exampkes of variational problems, by which 
we mean problems involving the determination of maxima and minima of 
functionals. 


1. Find the shortest plane curve joining two points A and B, i.e., find the 
curve y = y(x) for which the functional 


[ vI+y? ax 


achieves its minimum. The curve in question turns out to be the straight 
line segment joining A and B. 


2 By [a, 5] is meant the closed interval a < x < b. 
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2. Let A and B be two fixed points. Then the time it takes a particle to 
slide under the influence of gravity along some path joining A and B 
depends on the choice of the path (curve), and hence is a functional. 
The curve such that the particle takes the least time to go from A to B 
is called the brachistochrone. The brachistochrone problem was posed 
by John Bernoulli in 1696, and played an important part in the develop- 
ment of the calculus of variations. The problem was solved by John 
Bernoulli, James Bernoulli, Newton, and L’Hospital. The brachisto- 
chrone turns out to be a cycloid, lying in the vertical plane and passing 
through A and B (cf. p. 26). 


3. The following variational problem, called the isoperimetric problem, 
was solved by Euler: Among all closed curves of a given length I, find the 
curve enclosing the greatest area. The required curve turns out to be 
a circle. 


All of the above problems involve functionals which can be written in 
the form 


J : F(x, y, y’) dx. 


Such functionals have a “localization property” consisting of the fact that 
if we divide the curve y = y(x) into parts and calculate the value of the 
functional for each part, the sum of the values of the functional for the 
separate parts equals the value of the functional for the whole curve. It is 
just these functionals which are usually considered in the calculus of variations. 
As an example of a ‘‘nonlocal functional,” consider the expression 


6 
| xV1i + y'*dx 


4 ____, 
i? V1 + y? dx 

which gives the abscissa of the center of mass of acurve y = y(x),a < x < 6, 
made out of some homogeneous material. 

An important factor in the development of the calculus of variations was 
the investigation of a number of mechanical and physical problems, e.g., 
the brachistochrone problem mentioned above. In turn, the methods of the 
calculus of variations are widely applied in various physical problems. It 
should be emphasized that the application of the calculus of variations to 
physics does not consist merely in the solution of individual, albeit very 
important problems. The so-called ‘variational principles,” to be discussed 
in Chapters 4 and 7, are essentially a manifestation of very general physical 
laws, which are valid in diverse branches of physics, ranging from classical 
mechanics to the theory of elementary particles. 

To understand the basic meaning of the problems and methods of the 
calculus of variations, it is very important to see how they are related to 
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problems of classical analysis, i.e., to the study of functions of n variables. 
Thus, consider a functional of the form 


Ib = [Fe yrds YO =A, 6) = B 


Here, each curve is assigned a certain number. To find a related function 
of the sort considered in classical analysis, we may proceed as follows. 
Using the points 

A=Xo, X1,---> Xn Xn+1 = b, 


we divide the interval [a, b] into n + 1 equal parts. Then we replace the 
curve y = y(x) by the polygonal line with vertices 


(Xo, A), (1, y(*1)); oes (x, yW(x,)); (Xn+1 B), 


and we approximate the functional J[y] by the sum 


nt+1 


J(Y1; oe > Vn) = > F(x Bo h, (2) 
t=1 


where 
YW = W(X), h =X, — X-1. 


Each polygonal line is uniquely determined by the ordinates y,,..., y, of 
its vertices (recall that y. = A and y,., = B are fixed), and the sum (2) 
is therefore a function of the n variables y,,..., ¥,. Thus, as an approxi- 
mation, we can regard the variational problem as the problem of finding the 
extrema of the function J(),...,y,). In solving variational problems, 
Euler made extensive use of this method of finite differences. By replacing 
smooth curves by polygonal lines, he reduced the problem of finding extrema 
of a functional to the problem of finding extrema of a function of n variables, 
and then he obtained exact solutions by passing to the limit as n— oo. 
In this sense, functionals can be regarded as “functions of infinitely many 
variables” [i.e., the values of the function y(x) at separate points], and the 
calculus of variations can be regarded as the corresponding analog of 
differential calculus. 


2. Function Spaces 


In the study of functions of n variables, it is convenient to use geometric 
language, by regarding a set of m numbers ();,..., ¥,) aS a point in an 
n-dimensional space. In just the same way, geometric language is useful 
when studying functionals. Thus, we shall regard each function y(x) 
belonging to some class as a point in some space, and spaces whose elements 
are functions will be called function spaces. 

In the study of functions of a finite number n of independent variables, 
it is sufficient to consider a single space, i.e., n-dimensional Euclidean space 
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é,.5 However, in the case of function spaces, there is no such ‘‘universal”’ 
space. In fact, the nature of the problem under consideration determines 
the choice of the function space. For example, if we are dealing with a 
functional of the form 


[Fes x de, 


it is natural to regard the functional as defined on the set of all functions 
with a continuous first derivative, while in the case of a functional of the 
form 


b 
[ Fa» yy") dx, 


the appropriate function space is the set of all functions with two continuous 
derivatives. Therefore, in studying functionals of various types, it is 
reasonable to use various function spaces. 

The concept of continuity plays an important role for functionals, just 
as it does for the ordinary functions considered in classical analysis. In 
order to formulate this concept for functionals, we must somehow introduce 
a concept of “closeness” for elements in a function space. This is most 
conveniently done by introducing the concept of the norm of a function, 
analogous to the concept of the distance between a point in Euclidean space 
and the origin of coordinates. Although in what follows we shall always 
be concerned with function spaces, it will be most convenient to introduce 
the concept of a norm in a more general and abstract form, by introducing 
the concept of a normed linear space. 


By a linear space, we mean a set & of elements x, y, z,... of any kind, 
for which the operations of addition and multiplication by (real) numbers 
a,8,... are defined and obey the following axioms: 


lLx+ty=ytx; 

2.(x+y+z=x+ (74+ 2); 

3. There exists an element 0 (the zero element) such that x + 0 = x for 
any x E@;* 

4. For each x € &, there exists an element —x such that x + (—x) = 0; 

5. l-x =x; 

6. (Bx) = («B)x; 

7. (« + B)x = ax + Bx; 

8. a(x + y) = ax + ay. 


3 See e.g., G. E. Shilov, An Introduction to the Theory of Linear Spaces, translated by 
R. A. Silverman, Prentice-Hall, Inc., Englewood Cliffs, N. J. (1961), Theorem 14 and 
Corollary, pp. 48-49. 

4 By x e #, we mean that the element x belongs to the set #. In these axioms, x, y 
and z are arbitrary elements of #, while a and § are arbitrary real numbers. 
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A linear space & is said to be normed, if each element x €& is assigned a 
nonnegative number ||x||, called the norm of x, such that 


1. ||x|| = Oif and only if x = 0; 
2. fjoxl] = lel falls 
3. |x + yl < [xl] + [>I 


In a normed linear space, we can talk about distances between elements, 
by defining the distance between x and y to be the quantity |x — yl. 

The elements of a normed linear space can be objects of any kind, e.g., 
numbers, vectors (directed line segments), matrices, functions, etc. The 
following normed linear spaces are important for our subsequent purposes: 


1. The space @, or more precisely @(a, b), consisting of all continuous 
functions y(x) defined on a (closed) interval [a@, 6). By addition of 
elements of @ and multiplication of elements of @ by numbers, we mean 
ordinary addition of functions and multiplication of functions by 

numbers, while the norm is defined 

as the maximum of the absolute 

value, i.e., 


Iylo = max |y)I. 
agi<cb 


Thus, in the space @, the distance 
between the function y*(x) and the 
function y(x) does not exceed « if 
the graph of the function y*(x) lies 
Ficure 1 inside a strip of width 2e (in the 
vertical direction) “‘bordering” the 

graph of the function y(x), as shown in Figure 1. 


2. The space Y,, or more precisely Y,(a, b), consisting of all functions 
y(x) defined on an interval [a,b] which are continuous and have 
continuous first derivatives. The operations of addition and multi- 
plication by numbers are the same as in @, but the norm is defined by 
the formula 


yl, = max |y(x)| + max |y’'Qd]- 
a<r<b a<r<b 
Thus, two functions in D, are regarded as close together if both the 


functions themselves and their first derivatives are close together, since 


ly — zl, <e 
implies that 


ly) — z)| <2, |y¥@) — 2’(X)| <e 


forala<x <b. 
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3. The space Y,, or more precisely 9,(a, b), consisting of all functions 
(x) defined on an interval [a,b] which are continuous and have 
continuous derivatives up to order n inclusive, where n is a fixed integer. 
Addition of elements of 9, and multiplication of elements of 9, by 
numbers are defined just as in the preceding cases, but the norm is 
now defined by the formula 


nr 
lyn = > max |y(x)|, 
t=0 2<2r<d 


where y(x) = (d/dx)'y(x) and y(x) denotes the function (x) itself. 
Thus, two functions in 9, are regarded as close together if the values 
of the functions themselves and of all their derivatives up to order 1 
inclusive are close together. It is easily verified that all the axioms of a 
normed linear space are actually satisfied for each of the spaces @, Fy, 
and &,,. 


Similarly, we can introduce spaces of functions of several variables, e.g., 
the space of continuous functions of n variables, the space of functions of n 
variables with continuous first derivatives, etc. After a norm has been 
introduced in the linear space @ (which may be a function space), it is 
natural to talk about continuity of functionals defined on &: 


DEFINITION. The functional J{y] is said to be continuous at the point 
JER if for anye > 0, there isa 8 > 0 such that 
\J{y] - JIPIl < (3) 
provided that | y — f|| < 8. 


Remark 1. The inequality (3) is equivalent to the two inequalities 


J{y] -J[p] > -« (4) 
and 


JTy] — JI¥] < «. (5) 


If in the definition of continuity, we replace (3) by (4), J[y] is said to be ower 
semicontinuous at y, while if we replace (3) by (5), J[y] is said to be upper 
semicontinuous at §. These concepts will be needed in Chapter 8. 


Remark 2. At first, it might appear that the space @, which is the largest 
of those enumerated, would be adequate for the study of variational problems. 
However, this is not the case. In fact, as already mentioned, one of the basic 
types of functionals considered in the calculus of variations has the form 


Jb] = f° Fes yy) dx. 


It is easy to see that such a functional (e.g., arc length) will be continuous if 
we interpret closeness of functions as closeness in the space Y,. However, 
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in general, the functional will not be continuous if we use the norm intro- 
duced in the space @,° even though it is continuous in the norm of the space 
@,. Since we want to be able to use ordinary analytic methods, e.g., passage 
to the limit, then, given a functional, it is reasonable to choose a function 
space such that the functional is continuous. 


Remark 3. So far, we have talked about linear spaces and functionals 
defined on them. However, in many variational problems, we have to deal 
with functionals defined on sets of functions which do not form linear spaces. 
In fact, the set of functions (or curves) satisfying the constraints of a given 
variational problem, called the admissible functions (or admissible curves), 
is in general not a linear space. For example, the admissible curves for the 
“‘simplest” variational problem (see Sec. 4) are the smooth plane curves 
passing through two fixed points, and the sum of two such curves does not 
pass through the two points. Nevertheless, the concept of a normed linear 
space and the related concepts of the distance between functions, continuity 
of functionals, etc., play an important role in the calculus of variations. A 
similar situation is encountered in elementary analysis, where, in dealing 
with functions of variables, it is convenient to use the concept of an 
n-dimensional Euclidean space @,, even though the domain of definition of 
a function may not be a linear subspace of @,. 


3. The Variation of a Functional. A Necessary Condition 
for an Extremum 


3.1. In this section, we introduce the concept of the variation (or 
differential) of a functional, analogous to the concept of the differential of a 
function of 1 variables. The concept will then be used to find extrema of 
functionals. First, we give some preliminary facts and definitions. 


DEFINITION. Given a normed linear space &, let each element he B 
be assigned anumber o[h], i.e., let p[h] be a functional definedon 2. Then 
[A] is said to be a (continuous) linear functional if 

1. [ak] = ag[h] for any he and any real number «; 

2. pfhy + he] = hi] + olfhe] for any hy, hg EB; 

3. g[h] is continuous (for all h € &). 

Example 1. If we associate with each function A(x) € @(a, b) its value at 
a fixed point Xp in [a, 5], i.e., if we define the functional ¢[h] by the formula 
plh] = A(X), 
then 9 [f] is a linear functional on @(a, b). 
5 Arc length is a typical example of such a functional. For every curve, we can find 


another curve arbitrarily close to the first in the sense of the norm of the space ©, whose 
length differs from that of the first curve by a factor of 10, say. 
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Example 2. The integral 
ofA] = ( A(x) ax 
defines a linear functional on (a, 6). 
Example 3. The integral 
lhl = f° aGeyi(x) dx, 
where a(x) is a fixed function in @(a, 5), defines a linear functional on @(a, 6). 


Example 4. More generally, the integral 
b 
glh] = i [oxo(xpA(x) + a(xph'(x) +--+ + an(xph™(x)] dx, (6) 


where the «,(x) are fixed functions in ¢(a, 6), defines a linear functional 
on J,(a, b). 


Suppose the linear functional (6) vanishes for all A(x) belonging to some 
class. Then what can be said about the functions a(x)? Some typical 
results in this direction are given by the following lemmas: 


Lemma 1. Jf a(x) is continuous in [a, b), and if 
i ” alx\h(x) dx = 0 


for every function h(x) € @(a, b) such that h(a) = h(b) = 0, then a(x) = 0 
for all x in [a, 6]. 


Proof. Suppose the function a(x) is nonzero, say positive, at some 
point in [a,b]. Then a(x) is also positive in some interval [x,, xo] 
contained in [a, 5]. If we set 


W(x) = (x — x1)(%2 — x) 
for x in [x;, x2] and A(x) = 0 otherwise, then A(x) obviously satisfies 
the conditions of the lemma. However, 


i ” ekxVix) dx = [eGo — x02 — dx > 0, 


since the integrand is positive (except at x, and x2). This contradiction 
proves the lemma. 


Remark. The lemma still holds if we replace @(a, b) by F,(a, b). To 
see this, we use the same proof with 


h(x) = [(x — *)(%2 — x)]"*? 


for x in [x,, x2] and A(x) = 0 otherwise. 
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LemMA 2. If a(x) is continuous in [a, b], and if 
b 
I a(x)h'(x) dx = 0 


for every function h(x)¢€Q,(a,b) such that h(a) = h(b) = 0, then 
a(x) = c for all x in [a, b), where c is a constant. 


Proof. Let c be the constant defined by the condition 
if {a(x) — c] dx = 0, 
and let 
h(x) = | (a) - el ae, 


so that A(x) automatically belongs to Y,(a, b) and satisfies the con- 
ditions h(a) = hA(b) = 0. Then on the one hand, 


[ a(x) — clh'(x) dx = i a(x)h'(x) dx — clh(b) — h(a)] = 0, 
while on the other hand, 
[ [a(x) — cli’) dx = (a a(x) — cP dx. 
It follows that a(x) — ¢ = 0, ie., a(x) = ¢, for all x in [a, 5). 


The next lemma will be needed in Chapter 8: 
Lemma 3. Ifa(x) is continuous in [a, b], and if 


I ” a(x)h'(x) dx = 0 


for every function h(x) D(a, b) such that h(a) = h(b) = 0 and 
h'(a) = h'(b) = 0, then a(x) = ¢y + Cx for all x in [a, b], where co and c, 
are constants. 


Proof. Let cy and c, be defined by the conditions 
b 
i) [«(x) — co - cx] dx = 0, 


ae? ( 
[, ax J fa) — co — x8] dé = 0, i 


and let 
Wx) = f° dé fat) — 9 — et dt, 


so that A(x) automatically belongs to 9,(a, b) and satisfies the conditions 
h(a) = h(b) = 0, h(a) = h'(b) = 0. Then on the one hand, 


c [a(x) - €9 — ¢yx]h"(x) dx 


f ” a(x’ (x) dx = eolh'(b) — KOI — cs f ” ch"(x) dx 
—c,[bh'(b) — ah'(a)} — c¢,[h(b) — A(a)] = 0, 
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while on the other hand, 
b 
i) [ofx) — Cp — ¢,x]h"(x) dx = i [a(x) — ¢o — Cyx]*dx = 0. 
It follows that o(x) — cy — c,x = 0, i.e., a(x) = cp + cx, for all x in 
{a, 5}. 
Lemma 4. If a{x) and B(x) are continuous in {a, b], and if 


[. Gon) + BEDA'~O] ax = 0 (8) 
for every function h(x) €Q,(a, b) such that h(a) = h(b) = 0, then B(x) 
is differentiable, and B'(x) = «(x) for all x in [a, 6). 

Proof. Setting 
A(x) = [ a(€) dé, 
and integrating by parts, we find that 
fF ecom(x) dx = — [ ” ACoh'(x) dx, 

i.e., (8) can be rewritten as 

[ [—A(x) + BOOIN'(x) dx = 0. 


But, according to Lemma 2, this implies that 
A(x) — A(x) = const, 
and hence by the definition of A(x), 
B(x) = a(x), 


for all x in [a, 5], as asserted. We emphasize that the differentiability 
of the function B(x) was not assumed in advance. 


3.2. We now introduce the concept of the variation (or differential) of a 
functional. Let J[y] be a functional defined on some normed linear space, 
and let 


AJfhA] = J[y + A] — Jy] 
be its increment, corresponding to the increment # = A(x) of the “independent 


variable” y = y(x). If » is fixed, AJ[A] is a functional of 4, in general a 
nonlinear functional. Suppose that 


AJ{h] = off] + ellAll, 


where [A] is a linear functional and «-> 0 as ||h|| +0. Then the functional 
J[y] is said to be differentiable, and the principal linear part of the increment 
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AJA], i.e., the linear functional 9[h] which differs from AJ[A] by an infinitesi- 
mal of order higher than 1 relative to |A||, is called the variation (or differ- 
ential) of J[y] and is denoted by 8J[A].° 


THEOREM |. The differential of a differentiable functional is unique. 


Proof. First, we note that if p[A] is a linear functional and if 


eff], g 

[A 
as ||Al| > 0, then 9[h] = 0, i-e., p[f/] = 0 for all A. In fact, suppose 
elo] £ Ofor some hyp # 0. Then, setting 


we see that ||A,|| > 0 as n—> oo, but 


lim —— = lim 
nao [Ap] no nilhol 


contrary to hypothesis. 
Now, suppose the differential of the functional J[y] is not uniquely 
defined, so that 


la) jim Mle] yw 9, 


AJ{hA] = 9,[A] + exfAll, 
AJ{h] = po[h] + coll, 


where ¢,[A] and ¢.[A] are linear functionals, and e,, 2 > 0 as ||| > 0. 
This implies 

gill] — palh] = eg|h|l 
and hence ¢,[h] — ¢.[A/] is an infinitesimal of order higher than | relative 


to ||Al|. But since p,[h] — ¢2[A] is a linear functional, it follows from the 
first part of the proof that ~,[h] — ~2[A] vanishes identically, as asserted. 


Next, we use the concept of the variation (or) differential of a functional 
to establish a necessary condition for a functional to have an extremum. 
We begin by recalling the corresponding concepts from analysis. Let 
F(x, ..., X,) be a differentiable function of 2 variables. Then F(x%,..., X,) 
is said to have a (relative) extremum at the point (X,,..., %,) if 


AF = F(x,,..-, Xn) — Fi, ~..-, Xn) 


has the same sign for all points (x;,..., x,) belonging to some neighborhood 
of (%,,..., X,), where the extremum F(X,,..., %,) is a minimum if AF > 0 
and a maximum if AF < 0. 

Analogously, we say that the functional J[y] has a (relative) extremum 
for y = y if J[y] — J[j] does not change its sign in some neighborhood of 


§ Strictly speaking, of course, the increment and the variation of J[y], are functionals 
of two arguments y and 4, and to emphasize this fact, we might write AJ[y; h] = 
dJ[y; A) + ellAl. 
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the curve y = (x). Subsequently, we shall be concerned with functionals 
defined on some set of continuously differentiable functions, and the functions 
themselves can be regarded either as elements of the space @ or elements 
of the space Y,. Corresponding to these two possibilities, we can define 
two kinds of extrema: We shall say that the functional J[y] has a weak 
extremum for y = f if there exists an « > O such that J[y] — J[j] has the 
same sign for all y in the domain of definition of the functional which satisfy 
the condition ||y — f||, < e, where | ||, denotes the norm in the space Jj. 
On the other hand, we shall say that the functional J[y] has a strong extremum 
for y = 9 if there exists an « > O such that J[y] — J[f] has the same sign 
for all y in the domain of definition of the functional which satisfy the 
condition ||y — fl) < ©, where | ||) denotes the norm in the space @. 
It is clear that every strong extremum is simultaneously a weak extremum, 
since if | y — Jl, < , then ||y — Slo < ©, a fortiori, and hence, if J[] is 
an extremum with respect to all y such that ||y — Silo < «, then J[J] is 
certainly an extremum with respect to all y such that ||y — fl], <«. How- 
ever, the converse is not true in general, i.e., a weak extremum may not be a 
strong extremum. As a rule, finding a weak extremum is simpler than 
finding a strong extremum. The reason for this is that, the functionals 
usually considered in the calculus of variations are continuous in the norm 
of the space Y, (as noted at the end of the previous section), and this con- 
tinuity can be exploited in the'theory of weak extrema. In general, however, 
our functionals will not be continuous in the norm of the space @. 


THEOREM 2. A necessary condition for the differentiable functional 
J[y] to have an extremum for y = § is that its variation vanish for y = J, 
i.e., that 

dJ[h] = 0 
for y = § and all admissible h. 


Proof. To be explicit, suppose J[y] has a minimum for y = 9. 
According to the definition of the variation 8J[h], we have 


AJTA] = 84h] + ell, () 


where e-> 0 as ||| > 0. Thus, for sufficiently small ||A||, the sign of 
AJ{h] will be the same as the sign of SJ[h]. Now, suppose that 
3J [ho] # 0 for some admissible 4p. Then for any « > 0, no matter 
how small, we have 


8J[—ahy] = —8J [ah]. 
Hence, (9) can be made to have either sign for arbitrarily small |A|. 
But this is impossible, since by hypothesis J[y] has a minimum for y = f, 
i.e., 
AJ[h] = J[p + kA] — JL¥] > 0 


for all sufficiently small ||A||. This contradiction proves the theorem. 
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Remark. In elementary analysis, it is proved that for a function to have 
a minimum, it is necessary not only that its first differential vanish (df = 0), 
but also that its second differential be nonnegative. Consideration of the 
analogous problem for functionals will be postponed until Chapter 5. 


4. The Simplest Variational Problem. Euler’s Equation 


4.1, We begin our study of concrete variational problems by considering 
what might be called the “simplest”’ variational problem, which can be 
formulated as follows: Let F(x, y, z) be a function with continuous first and 
second (partial) derivatives with respect to all its arguments. Then, among 
all functions y(x) which are continuously differentiable for a <x <b and 
satisfy the boundary conditions 


yYa)= A, yb) =B, (10) 


find the function for which the functional 


Jy) = f° Fe.» » dx ay 


has a weak extremum. In other words, the simplest variational problem 
consists of finding a weak extremum of a functional! of the form (11), where 
the class of admissible curves (see p. 8) consists of all smooth curves joining 
two points. The first two examples on pp. 2, 3, involving the brachistochrone 
and the shortest distance between two points, are variational problems of 
just this type. To apply the necessary condition for an extremum (found in 
Sec. 3.2) to the problem just formulated, we have to be able to calculate the 
variation of a functional of the type (11). We now derive the appropriate 
formula for this variation. 
Suppose we give y(x) an increment A(x), where, in order for the function 


Wx) + A(x) 
to continue to satisfy the boundary conditions, we must have 
h(a) = h(b) = 0. 
Then, since the corresponding increment of the functional (11) equals 
AJ =Jly +h) - ID) =f Foy + hy + ade — [FO yy) de 
=f IF y + hy +h) — Fes y, yds, 


it follows by using Taylor’s theorem that 


b 
AJ = [LRG yh + Fy 9, yw] dx +o, a2) 
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where the subscripts denote partial derivatives with respect to the corres- 
ponding arguments, and the dots denote terms of order higher than | relative 
tohandh’. The integral in the right-hand side of (12) represents the principal 
linear part of the increment AJ, and hence the variation of J[y] is 


6 
87 = [LEG yy + Fy Gs ys yh] dx. 


According to Theorem 2 of Sec. 3.2, a necessary condition for J[y] to have 
an extremum for y = (x) is that 


ay = | ” (Eh + Eyh’) dx = 0 (13) 


for all admissible 4. But according to Lemma 4 of Sec. 3.1, (13) implies 
that 
da 


Ay-z 


F, =0, (14) 


a result known as Euler’s equation.’ Thus, we have proved 


THEOREM 1. Let J[y] be a functional of the form 


ie F(x, y, y’) dx, 


defined on the set of functions y(x) which have continuous first derivatives 
in [a, b] and satisfy the boundary conditions y(a) = A, y(b) = B. Then 
a necessary condition for J[y] to have an extremum for a given function 
yx) is that (x) satisfy Euler’s equation® 


d 


fy = 


Fy = 0. 

The integral curves of Euler’s equation are called extremals. Since 
Euler’s equation is a second-order differential equation, its solution will in 
general depend on two arbitrary constants, which are determined from the 
boundary conditions y(a) = A, y(b) = B. The problem usually considered 
in the theory of differential equations is that of finding a solution which is 
defined in the neighborhood of some point and satisfies given initial con- 
ditions (Cauchy’s problem). However, in solving Euler’s equation, we are 
looking for a solution which is defined over all of some fixed region and 
satisfies given boundary conditions. Therefore, the question of whether 
or not a certain variational problem has a solution does not just reduce to the 


7 We emphasize that the existence of the derivative (d/dx)F,- is not assumed in 
advance, but follows from the very same lemma. 

® This condition is necessary for a weak extremum. Since every strong extremum is 
simultaneously a weak extremum, any necessary condition for a weak extremum is 
also a necessary condition for a strong extremum. 
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usual existence theorems for differential equations. In this regard, we now 
state a theorem due to Bernstein,® concerning the existence and uniqueness of 
solutions “in the large” of an equation of the form 


y” = F(x, y, y’). (15) 


THEOREM 2 (Bernstein). If the functions F, F, and F,. are continuous 
at every finite point (x, y) for any finite y’, and if a constant k > 0 and 
functions 


a=a(x,y)20, B=Bx,y) 20 


(which are bounded in every finite region of the plane) can be found such 
that 


F(xy,y)>k,  |FOayy')| < ay’? + B, 


then one and only one integral curve of equation (15) passes through any 
two points (a, A) and (b, B) with different abscissas (a # 6). 


Equation (13) gives a necessary condition for an extremum, but in general, 
one which is not sufficient. The question of sufficient conditions for an 
extremum will be considered in Chapter 5. In many cases, however, 
Euler’s equation by itself is enough to give a complete solution of the prob- 
lem. In fact, the existence of an extremum is often clear from the physical or 
geometric meaning of the problem, e.g., in the brachistochrone problem, 
the problem concerning the shortest distance between two points, etc. If in 
such a case there exists only one extremal satisfying the boundary conditions 
of the problem, this extremal must perforce be the curve for which the 
extremum is achieved. 

For a functional of the form 


[FG yy) de 


Euler’s equation is in general a second-order differential equation, but it 
may turn out that the curve for which the functional has its extremum is 
not twice differentiable. For example, consider the functional 


1 
Jy] = [x — y'P dx, 
where 
W-1)=90, yp) = 1. 
The minimum of J[y] equals zero and is achieved for the function 


0 for -l<x<0, 


PRI = x? for O<x<il, 


®S. N. Bernstein, Sur les équations du calcul des variations, Ann. Sci. Ecole Norm. 
Sup.,29, 431-485 (1912). 
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which has no second derivative for x = 0. Nevertheless, y(x) satisfies 
the appropriate Euler equation. In fact, since in this case 


F(x, y, y') = (2x — y'f, 
it follows that all the functions 
a 
dx 
vanish identically for -1 < x < 1. Thus, despite the fact that Euler’s 
equation is of the second order and y’(x) does not exist everywhere in 
[—1, 1], substitution of )(x) into Euler’s equation converts it into an identity. 


We now give conditions guaranteeing that a solution of Euler’s equation 
has a second derivative: 


F, = 2y(2x a yy, F,, = —2y?(2x = y)s Fy 


THEOREM 3. Suppose y = y(x) has a continuous first derivative and 
satisfies Euler’s equation 
d 
dx 
Then, if the function F(x, y, y’) has continuous first and second derivatives 
with respect to all its arguments, y(x) has a continuous second derivative 
at all points (x, y) where 


Fyy[x, W(x), y'(x)] # 9. 


Proof. Consider the difference 


F,-—F, =0. 


AF, = Fy(x + Ax, yt Ay, y’ a Ay’) - Fy (x, y, ¥’') 
= AxF,, + AyFyy + Ay'Fyy, 


where the overbar indicates that the corresponding derivatives are evalu- 
ated along certain intermediate curves. We divide this difference by 
Ax, and consider the limit of the resulting expression 


Ay’ = 


= Ay = 
A> Fv'y + Ay Pov 


Fye + Ay 


asAx->0. (This limit exists, since F, has a derivative with respect to 
x, which, according to Euler’s equation, equals F,.) Since, by hypoth- 
esis, the second derivatives of F(x, y,z) are continuous, then, as 
Ax — 0, F,, converges to F,.;, i.e., to the value of ?F/dy’ éx at the point 
x. It follows from the existence of y’ and the continuity of the second 
derivative F,, that the second term (Ay/Ax)F,y also has a limit as 
Ax—0. But then the third term also has a limit (since the limit of the 
sum of the three terms exists), i.e., the limit 


. Aya 
et Ay 2 
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exists. As Ax—0, F,y converges to Fy, # 0, and hence 


BD os wg 
bm ag oe 


’ 


exists. Finally, from the equation 
@ 
dx 
we can find an expression for y", from which it is clear that y” is 
continuous wherever F,-,- # 0. This proves the theorem. 


Fy, — F, = 0, 


Remark. Here it is assumed that the extremals are smooth.’° In Sec. 15 
we shall consider the case where the solution of a variational problem may 
only be piecewise smooth, i.e., may have “‘corners”’ at certain points. 


4.2, Euler’s equation (14) plays a fundamental role in the calculus of 
variations, and is in general a second-order differential equation. We now 
indicate some special cases where Euler’s equation can be reduced to a first- 
order differential equation, or where its solution can be obtained entirely 
in terms of quadratures (i.e., by evaluating integrals). 


Case 1. Suppose the integrand does not depend on y, i.e., let the functional 
under consideration have the form 


o 
| F(x, y’') dx, 
a 
where F does not contain y explicitly. In this case, Euler’s equation becomes 
d 
ax Fy — 0, 
which obviously has the first integral 
FyoaiG (16) 


where C is a constant. This is a first-order differential equation which 
does not contain y. Solving (16) for y’, we obtain an equation of the form 


y’ = I(x, C), 
from which y can be found by a quadrature. 


Case 2, If the integrand does not depend on x, i.e., if 


o 4. 
J[y] = f F(y, y’) ax, 
then 
d 


Fy - Fy = Fy — Fyyy’ — Fyyy". (17) 


10 We say that the function )(x) is smooth in an interval [a, 5) if it is continuous in 
[a, 5), and has a continuous derivative in [a, 5]. We say that p(x) is piecewise smooth in 
(a, 6) if it is continuous everywhere in [a, 5], and has a continuous derivative in [a, 5) 
except possibly at a finite number of points. 
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Multiplying (17) by »’, we obtain 
, , Pus d , 
Fy — Fyyy? — Fry yy" = a F — Fy). 
Thus, in this case, Euler’s equation has the first integral 


F-y'F, =C, 
where C is a constant. 


Case 3. If F does not depend on y', Euler’s equation takes the form 


F(x, y) = 0, 
and hence is not a differential equation, but a “finite”? equation, whose 
solution consists of one or more curves y = p(x). 


Case 4. In a variety of problems, one encounters functionals of the form 


f f(% y)V1 + y? dx, 


representing the integral of a function f(x, y) with respect to the arc length 
s(ds = V1 + y dx). In this case, Euler’s equation can be transformed 
into 


OF d (aF\ _ om: a y' 
ere (5) = f(x, y)V1 + a S(% ¥) Via? =| 
_ Vo = y 
STE? La rye Via S04" 


1 F “ - 
= Via? [4 - ty =S7 ] = 0, 


” 


a 


a a re 
ITF . 


Sy = Sey’ 
Example 1. Suppose that 
avy 
Jy] = [ dx, yt) = 0, 9) = 1. 
1 x 
The integrand does not contain y, and hence Euler’s equation has the form 


F, = C (cf. Case 1). Thus, 


, 


y 


xV1+y”? 2 
so that 
y%(1 = C2x?) = C2x2 
or 


om Cx 
1 A Cae 
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from which it follows that 


Cx dx 1 


C2 2 C 
. Paperars cv! ‘ : 
or 


1 
(y — C))? + x? = Cc 


Thus, the solution is a circle with its center on the y-axis. From the 
conditions y(1) = 0, y(2) = 1, we find that 


so that the final solution is 
(y — 2)? + x? = 5. 


Example 2. Among all the curves joining two given points (Xo, Yo) and 
(x1, 1), find the one which generates the surface of minimum area when rotated 
about the x-axis. As we know, the area of the surface of revolution generated 
by rotating the curve » = y(x) about the x-axis is 


2n fe yV 1 + y'? dx. 
To 


Since the integrand does not depend explicitly on x, Euler’s equation has the 
first integral 


F-y'Fy =C 
(cf. Case 2), i-e., 
vi+y? y 
+ y? — =C 
¢ is = V1 + y? 
or 
y=Cvl4+y?, 
so that 
: y? — C2 
I= C2 . 
Separating variables, we obtain 
Pe C dy 
Vy? — C2 
i.e., 
2_ 2 
pies eye c* 
Cc 
so that 
y= C cosh = +o (18) 
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Thus, the required curve is a catenary passing through the two given 
points. The surface generated by rotation of the catenary is called a catenoid. 
The values of the arbitrary constants C and C, are determined by the 
conditions 


Y(%o) = Yos (x1) = yr. 


It can be shown that the following three cases are possible, depending on 
the positions of the points (%o, yo) and (x1, 3): 


1. If a single curve of the form (18) can be drawn through the points 
(Xo, ¥o) and (x1, ¥1), this curve is the solution of the problem [see 
Figure 2(a)]. 


2. If two extremals can be drawn through the points (x9, Yo) and (x, yi), 
one of the curves actually corresponds to the surface of revolution 
of minimum area, and the other does not. 


3. If there is no curve of the form (18) passing through the points (xo, y9) 
and (x,, y,), there is no surface in the class of smooth surfaces of revo- 
lution which achieves the minimum area. In fact, if the location of the 


FIGuRE 2 


two points is such that the distance between them is sufficiently large 
compared to their distances from the x-axis, then the area of the surface 
consisting of two circles of radius yo and y;, plus the segment of the 
x-axis joining them [see Figure 2(b)] will be less than the area of any 
surface of revolution generated by a smooth curve passing through the 
points. Thus, in this case the surface of revolution generated by the 
polygonal line 4x,x,B has the minimum area, and there is no surface 
of minimum area in the class of surfaces generated by rotation about the 
x-axis of smooth curves passing through the given points. (This case, 
corresponding to a “broken extremal,”’ will be discussed further in 
Sec. 15.) 


Example 3. For the functional 


Jil = [= ax, (19) 
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Euler’s equation reduces to a finite equation (see Case 3), whose solution 
is the straight line y = x. In fact, the integral (19) vanishes along this line. 


5. The Case of Several Variables 


So far, we have considered functionals depending on functions of one 
variable, i.e., on curves. In many problems, however, one encounters 
functionals depending on functions of several independent variables, i.e., on 
surfaces. Such multidimensional problems will be considered in detail in 
Chapter 7. For the time being, we merely give an idea of how the formula- 
tion and solution of the simplest variational problem discussed above carries 
over to the case of functionals depending on surfaces. 

To keep the notation simple, we confine ourselves to the case of two 
independent variables, but all our considerations remain the same when there 
are n independent variables. Thus, let F(x, y, z, p,q) be a function with 
continuous first and second (partial) derivatives with respect to all its argu- 
ments, and consider a functional of the form 


J[z] = J {. F(x, Ys Zs Zz Zy) dx dy, (20) 


where R is some closed region and z,, z, are the partial derivatives of 
Zz = z(x, y). Suppose we are looking for a function z(x, y) such that 


1. z(x, y) and its first and second derivatives are continuous in R; 

2. z(x, y) takes given values on the boundary I of R; 

3. The functional (20) has an extremum for z = z(x, y). 
Since the proof of Theorem 2 of Sec. 3.2 does not depend on the form of 
the functional J, then, just as in the case of one variable, a necessary condition 
for the functional (20) to have an extremum is that its variation (i.e., the 
principal linear part of its increment) vanish. However, to find Euler’s 


equation for the functional (20), we need the following lemma, which is 
analogous to Lemma | of Sec. 3.1 (see also the remark on p. 9): 


Lemma. If a(x, y) is a fixed function which is continuous in a closed 
region R, and if the integral 


[ J, 2s wile, ») dx dy (21) 


vanishes for every function h(x, y) which has continuous first and second 
derivatives in R and equals zero on the boundary T of R, then a(x, y) = 0 
everywhere in R. 


Proof. Suppose the function «(x, y) is nonzero, say positive, at 
some point in R. Then a(x, y) is also positive in some circle 


(x — Xo)? + (y — Yo)? < ©? (22) 
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contained in R, with center (xo, yo) and radius «. If we set A(x, y) = 0 
outside the circle (22) and 
h(x, y) = [(% — x0)? + (Y — Yo)? — 7} 


inside the circle, then A(x) satisfies the conditions of the lemma. How- 
ever, in this case, (21) reduces to an integral over the circle (22) and is 
obviously positive. This contradiction proves the lemma. 


In order to apply the necessary condition for an extremum of the functional 
(20), i.e., J = 0, we must first calculate the variation 8J. Let A(x, y) be an 
arbitrary function which has continuous first and second derivatives in the 
region R and vanishes on the boundary I’ of R. Then if z(x, y) belongs to 
the domain of definition of the functional (20), so does z(x, y) + A(x, y). 
Since 

AJ = J[z + h] - J[z] = If LF(x, yy z+ My Ze + tgs Zy + hy) 


et F(x, Vs 2, 225 zy)] dx dy, 
it follows by using Taylor’s theorem that 


AJ = ff (Fah + Faghe + Fahy) de dy + >>>, 


where the dots denote terms of order higher than | relative to h, h, and hy. 
The integral on the right represents the principal linear part of the increment 
AJ, and hence the variation of J[z] is 


= { if (Fh + Fijhz + Fe,hy) dx dy. 
Next, we observe that 


he (F.,h, + F.,h,) dx dy 


= J, [een + 5 en] axdy -[f (Ga. + Zr) bara 
= |. G.ndy- Fads) - ff (ZA + Fs} h dx dy, 


where in the last step we have used Green’s theorem?! 


I, (2 -$) dx dy = | (Pdx + Qdy). 


The integral along I is zero, since A(x, y) vanishes on I, and hence, comparing 
the last two formulas, we find that 


Q 7) 
v= J yh (F. — AF F,) h(x, y) dx dy. (23) 


11 See e.g., D. V. Widder, Advanced Calculus, second edition, Prentice-Hall, Inc., 
Englewood Cliffs, N.J. (1961), p. 223. 
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Thus, the condition J = 0 implies that the double integral (23) vanishes for 
any A(x, y) satisfying the stipulated conditions. According to the lemma, 
this leads to the following second-order partial differential equation, again 
known as Euler’s equation: 


0 a 


Fes FiO, (24) 


We are looking for a solution of (24) which takes given values on the 
boundary I. 


Example. Find the surface of least area spanned by a given contour. 
This problem reduces to finding the minimum of the functional 


Jt = ff vi + z2 4+ 22 dx dy, 


so that Euler’s equation has the form 


r(1 + g?) — 2spq + t(1 + p*) = 0, (25) 
where 
P=2Zn Q=2Zy FH SH Zyy b= Zyy 


Equation (25) has a simple geometric meaning, which we explain by using 
the formula 
AG »| = 


Meo\G Taal EG AFD 


for the mean curvature of the surface, where E, F, G and e, f, g are the 
coefficients of the first and second fundamental quadratic forms of the 
surface.!? If the surface is given by an explicit equation of the form 
z = 2(x, y), then 
E=1+p*, F=pq, G=1+@, 
_ r y= s a= t 
V1 + p? + @ V1 + p? + @ V1i+ p+? 


and hence 


e 


(1_+ p*)t — 2spq + (1 + 9°)r 
M ae 
V1 +p? + ¢ 
Here, the numerator coincides with the left-hand side of Euler’s equation 


(25). Thus, (25) implies that the mean curvature of the required surface 
equals zero. Surfaces with zero mean curvature are called minimal surfaces. 


12 See e.g., D. V. Widder, op. cit., Chap. 3, Sec. 6, and E. Kreysig, Differential 
Geometry, University of Toronto Press, Toronto (1959), Chap.4. Here,x, and x2 denote 
the principal normal curvatures of the surface. 
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6. A Simple Variable End Point Problem 


There are, of course, many other kinds of variational problems besides 
the “simplest” variational problem considered so far, and such problems 
will be studied in Chapters 2 and 3. However, this is a suitable place for 
acquainting the reader with one of these problems, i.e., the variable end 
point problem, a particular case of which can be stated as follows: Among all 
curves whose end points lie on two given vertical lines x = a and x = b, 
find the curve for which the functional 


Jb] = f° FG», ») dx (26) 


has an extremum. 
We begin by calculating the variation 5/ of the functional (26). As 
before, 5J means the principal linear part of the increment 


b 
AJ =Jly + A) — JD) = [Gy + hay! +H) — Fy, vax. 
Using Taylor’s theorem to expand the integrand, we obtain 
b 
AJ = | (Fh + E,h’) dx t-->, 

where the dots denote terms of order higher than | relative to A and h’, and 
hence 

o 

8J = (Fh + Fh’) dx. 


Here, unlike the fixed end point problem, /(x) need no longer vanish at the 
points a and 5b, so that integration by parts now gives'* 


ar = |" (F, - 5 Fr) Mx)dx + FADES 


= | (Fo ~ SFr) Ma) dx + Fylean MO) — Fylene Ma). 
We first consider functions A(x) such that h(a) = h(b) = 0. Then, as in 
the simplest variational problem, the condition 8J = 0 implies that 


d 
F,- ax ty = 0. (28) 
Therefore, in order for the curve y = y(x) to be a solution of the variable 


end point problem, y must be an extremal, i.e., a solution of Euler’s equation. 


13 The more general case where the end points lie on two given curves y = 9(x) and 
y = Ux) is treated in Sec. 14. 
14 As usual, f(x)|722 stands for f(b) — f(a). 
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But if y is an extremal, the integral in the expression (27) for 8J vanishes, 
and then the condition 5J = 0 takes the form 


Fy|r=» A(b) — Fy|2=. h(a) = 9, 
from which it follows that 
Fy\.a = 0, Fy|z=. = 0, (29) 


since A(x) is arbitrary. Thus, to solve the variable end point problem, we 
must first find a general integral of Euler’s equation (28), and then use the 
conditions (29), sometimes called the natural boundary conditions, to determine 
the values of the arbitrary constants. 

Besides the case of fixed end points and the case of variable end points, 
we can also consider the mixed case, where one end is fixed and the other is 
variable. For example, suppose we are looking for an extremum of the 
functional (26) with respect to the class of curves joining a given point A 
(with abscissa a) and an arbitrary point of the line x = 6. In this case, the 
conditions (29) reduce to the single condition 


Fyle-o = 0, 


and y(a) = A serves as the second boundary condition. 


Example. Starting from the point P = (a, A), a heavy particle slides 
down a curve in the vertical plane. Find the curve such that the particle 
reaches the vertical line x = b (#a) in the shortest time. (This is a variant 
of the brachistochrone problem, p. 3.) 


For simplicity, we assume that the original point coincides with the origin 
of coordinates. Since the velocity of motion along the curve equals 


ae =. 4/ Fae 
ae Vi+y 


we have 


MIA? ge VIE 


dt = 
v Vv 2gy 


dx, 


so that the transit time T is given by the equation 
T= (oe 1+ y? Ps 
Vv 2gy 2gy 


The general solution of the corresponding Euler equation consists of a 
family of cycloids 


x =r(0-—sin6)+c, y=r(l — cos 8). 
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Since the curve must pass through the origin, we must have c= 0. To 
determine r, we use the second condition 


, 


y 
Fy, = —— = 0 for x = 5b, 
“  W2gy V+ y? 


i.e., y’ = O for x = b, which means that the tangent to the curve at its right 


end point must be horizontal. It follows that r = b/x, and hence the 
required curve is given by the equations 


b ; b 
x = — (6 — sin 6), y == (I — cos 8). 


7. The Variational Derivative 


In Sec. 3.2 we introduced the concept of the differential of a functional. 
We now introduce the concept of the variational (or functional) derivative, 
which plays the same role for functionals as the concept of the partial 
derivative plays for functions of n variables. We begin by considering 
functionals of the type 


JOL=f Fesy yd, a) = 4, 0) = B, (30) 


corresponding to the simplest variational problem. Our approach is to 
first go from the variational problem to an n-dimensional problem, and then 
pass to the limit n > oo. 
Thus, we divide the interval [a,b] into n + 1 equal subintervals by 
introducing the points 
Xo = A, X1,-+-y Xn, Xnv1 = 5, (X41 — x, = Ax), 


and we replace the smooth function y(x) by the polygonal line with vertices 


(Xo, Yo)s (%1, Yi), eee (Xn, Yn)s (Xn. Yn+1)s 
where y, = y(x,).1° Then (30) can be approximated by the sum 


Tay. In) =D F (Hi A") A, G31) 
t=0 


which is a function of n variables. (Recall that yp = A and y,z,,; = B are 
fixed.) 
Next, we calculate the partial derivatives 


OJ (V1; oe ey Yn), 
OY. 
and we consider what happens to these derivatives as the number of points 
of subdivision increases without limit. Observing that each variable y, 


1% This is the method of finite differences (cf. Secs. 1, 40). 
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in (31) appears in just two terms, corresponding to i= k and i=k — 1, 
we find that 


= Vers — Vee 
. = Fy (xu I Ax ) ox 


+ Fy (x. Vier. He Sea) —! F, (x. Veo ae 2). 


As Ax — 0, i.e., as the number of points of subdivision increases without 
limit, the right-hand side of (32) obviously goes to zero, since it is a quantity 
of order Ax. In order to obtain a limit which is in general nonzero as 
Ax — 0, we divide (32) by Ax, obtaining 


ot Vi+1 — 2) 
oy, Ax — Fy, (x. Vrs Ax 


1 = ae 
_ Ax [F. (x1 Ys eas — 2) — Fy (x.-1, Ve-1s a | 


We note that the expression dy, Ax appearing in the denominator on the left 
has a direct geometric meaning, and is in fact just the area of the region 
lying between the solid and the dashed curves in Figure 3. 


(32) 


y gis 
¢ N 
Zs N\ 

Z | N | 
| l 
I | 
ly, ] 
I I | | 
I | | l 

(aes l Ax 

% x 
FIGuRE 3 


As Ax-> 0, the expression (33) converges to the limit 


bJ ; d , 
3y = F(x, y, y’) - qx Ful Ys ¥)s 


called the variational derivative of the functional (30). We see that the 
variational derivative 8J/Sy is just the left-hand side of Euler’s equation 
(28), and hence the meaning of Euler’s equation is just that the variational 
derivative of the functional under consideration should vanish at every point. 
This is the analog of the situation encountered in elementary analysis, where 
a necessary condition for a function of n variables to have an extremum is 
that all its partial derivatives vanish. 

In the general case, the variational derivative is defined as follows: Let 
J[y] be a functional depending on the function (x), and suppose we give 
y(x) an increment A(x) which is different from zero only in the neighborhood 
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of a point x9. Dividing the corresponding increment J[y + A] — J[y] of 
the functional by the area Ao lying between the curve y = A(x) and the 
x-axis,!® we obtain the ratio 


J{y + A] - J{y) 


Ac a 


Next, we let the area Ac go to zero in such a way that both max |A(x)| and 
the length of the interval in which A(x) is nonvanishing go to zero. Then, 
if the ratio (34) converges to a limit as Ac—0, this limit is called the 
variational derivative of the functional J[y] at the point x) [for the curve 
y = y(x)], and is denoted by 

aJ 

8Y|2=20 


It can be shown that the analogs of all the familiar rules obeyed by ordinary 
derivatives (e.g., the formulas for differentiating sums and products of func- 
tions, composite functions, etc.) are valid for variational derivatives. 


Remark. It is clear from the definition of the variational derivative 
that if A(x) is different from zero in a neighborhood of the point x, and if 
Ao is the area between the curve y = A(x) and the x-axis, then 

oJ 
AJ =J{y + A] —Jly] = =| 
b+ -JDI=45) 
where « — 0 as both max [A(x)| and the length of the interval in which A(x) 
is nonvanishing go to zero. It follows that in terms of the variational 
derivative, the differential or variation of the functional J[y] at the point x, 
[for the curve y = y(x)] is given by the formula 


+e}Aco, 


_ oy 


ay 


Ac. 


Z=Zo 


8. Invariance of Euler’s Equation 


Suppose that instead of the rectangular plane coordinates x and y, we 
introduce curvilinear coordinates uw and v, where 


x = x(u, v), 


> Yu, v), 


Then the curve given by the equation y = y(x) in the xy-plane corresponds 
to the curve given by some equation 


Xu Xp 


# 0. 35 
Yu Vv (39) 


v = v(u) 


18 Ao can also be regarded as the area between the curves y = y(x) and y = y(x) + A(x). 


30 ELEMENTS OF THE THEORY CHAP. 1 


in the wv-plane. When we make the change of variables (35), the functional 


Jil = [Fos yy) dx 
goes into the functional 
b u v’ , 
J,{v] = i F [xc v), y(u, v), aa (x, + x,v’) du 
= i F,(u, v, v’) du, 
where 


F,(u, v, v') = F [0 v), y(u, v), ye) (Xy + X,v'). 
u v 


We now show that if y = y(x) satisfies the Euler equation 
=> =0 (36) 


corresponding to the original functional J[y], then v = v(u) satisfies the 
Euler equation 


“1 _ £1 =9 (37) 


corresponding to the new functional J,{v]. To prove this, we use the concept 
of the variational derivative, introduced in the preceding section. Let Ac 
denote the area bounded by the curves y = y(x) and y = p(x) + A(x), and 
let Ac, denote the area bounded by the corresponding curves v = v(w) and 
v = v(u) + 7(u) in the uv-plane. By the standard formula for the trans- 
formation of areas, the limit as Ac, Ac, — 0 of the ratio Ac/Ac, approaches 
the Jacobian 

Xu Xy 


Yu Jv 
which by hypothesis is nonzero. Thus, if 


Jy + hl = Jiy] _ 9 
Ac , 


> 


lim 
40-0 
then 
lim Ji[v + 4] = J, [v] =0 
4a,>0 Oy 
as well. It follows that v(u) satisfies (37) if y(x) satisfies (36). In other 
words, whether or not a curve is an extremal is a property which is independent 
of the choice of the coordinate system. 

In solving Euler’s equation, changes of variables can often be used to 
advantage. Because of the invariance property just proved, the change of 
variables can be made directly in the integral representing the functional 
rather than in Euler’s equation, and we can then write Euler’s equation for the 
new integral. 
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Example. Suppose we are looking for the extremals of the functional 
Jil = [" VB +P? dp, (38) 
@o 


where r = r(p). The corresponding Euler equation has the form 
r d r 
Ve+r? dgVatr? 
The change of variables 
x =rcosg, y=rsing 


transforms (38) into an integral of the form 
i V1 + y? dx, 
Zo 

which has the Euler equation 


with general solution 
y=axt B. 
Therefore, the solution of (38) is 


rsing = arcos@ + f. 


PROBLEMS 


J. Use the method of finite differences (Sec. 1) to find the shortest plane curve 
joining two points A and B. 


2. A set.@ in a normed linear space @& is said to be convex if A contains all 
elements of the form ax + By, where a, @ 2 0,a + @ = 1, provided that #4 
contains x and y. Prove that the set of all elements xe satisfying the 
inequality ||x — xol| < c, where xp is a fixed element of Zand c > 0, is convex. 


3. Show that the set Ga, 5) of all continuous functions defined on the 
interval [a, 6], equipped with the norm 


b 1/2 
In ={f ly@op ax} 
forms a normed linear space. 


4. An infinite sequence of elements ):, yo, ... of elements of a normed 
linear space Z& is called a Cauchy sequence (or fundamental sequence) if, given 
anye > 0, thereexists an integer N = N(e)such that ||y, — yall < ©, provided 
that m > N,n > N. A normed linear space & is said to be complete if every 
Cauchy sequence in @ converges to some element in @. Prove that the space 
€(a, b) introduced in the preceding problem is not complete, but that the space 
€(a, b) introduced in Sec. 2 is complete. 


Comment. See e.g., G. E. Shilov, op. cit., p. 249. 
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5. Prove that any norm defined on a linear space & is a continuous functional 
on &. 


6. Suppose the norm of the space &,(a, 6) is defined as 
ly} = max {| |y’@],---. LYM), 


instead of 


n 
Iyl = > max [yl 
i=0 2 6 


<i< 


asonp.7. Prove that any functional on 9,(a, 6) which is continuous with 
respect to one of these norms is continuous with respect to the other. 


7. Let J[y] be the arc-length functional, defined for all ye Z,(a, 6). Show 


that J[y] is lower semicontinuous with respect to the norm of the space 
€(a, b). 


Comment. As remarked in footnote 5, p. 8, J[y] is not continuous with 
respect to the norm of @(a, 6). 


8. Let p[h] be a linear functional defined on a normed linear space #. Prove 
that if p[hk] is continuous for A = 0, it is continuous for all he &. 


9. Prove that a linear functional @[h] cannot have an extremum unless 
p[h] = 0. 


10. Prove that if two linear functionals [kh] and $[h] defined on the same 
space vanish on the same set of elements, then [kh] = [A], where A is a 
constant. 


11. Show that constants co and c, can always be chosen satisfying the 
conditions (7) used to prove Lemma 3, p. 10. 


12. Prove that the square of a differentiable functional is differentiable, and 
write a formula for its differential (variation). 


13. Prove that if two differentiable functionals defined on the same normed 
linear space have the same differential at every point of the space, then they 
differ by a constant. 


14. Analyze the variational problems corresponding to the following func- 
tionals, where in each case »(0) = 0, (1) = 1: 


1 1 1 

a) i y' dx; b) I yy dx; c) | xyy’ dx, 
i) 0 oO 

15. Find the extremals of the following functionals: 

b b y? 

a) i} (y? + y? — 2y sin x) dx; b) | xe ox; 
o b 

c) i) (y? — y’? — 2y cosh x) dx; d) (y? + y’? + 2ye?) dx; 
& 

e) i) (y? — y? — 2ysin x) dx. 


Ans. b) y = Cix* + C23 d) y = 4xe7 + Cye7 + Coe™7. 
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16. Prove the uniqueness part of Bernstein’s theorem (p. 16). 


Hint. Let A(x) = 92(x) — 9¢:(x), where 9:(x) and ¢2(x) are two solutions 
of (15), write an expression for A*(x) and use the condition F,(x, y, y’) > k. 


17. Prove that one and only one extremal of each of the functionals 
fe-2(y2 — 1) dx, fo? + y'tan-ly’ —Invl + y?) dx 
passes through any two points of the plane with different abscissas. 


Hint. Apply Bernstein’s theorem. 


18. Find the general solution of the Euler equation corresponding to the 
functional 


si = [fev $y? dx, 


and investigate the special cases f(x) = vx and f(x) = x. 
Comment. The case f(x) = 1/x is treated in Example 1, p. 19. 


19. Find all minimal surfaces whose equations have the form z = 9(x) + ()). 
cos a(y — Yo). 


Ans, z= Ax + BY + C, et2-20) = 
cos a(x — Xp) 


20. Which curve minimizes the integral 
1 
I (ty? + yy’ + y’ + y) dx, 
when the values of y are not specified at the end points? 
Ans. y = ¥(x? — 3x + 1). 


21. Calculate the variational derivative at the point x, of the quadratic 
functional 


Ty = [f° KG, DO ds dt. 
22. Find the extremals of the functional 
[ VER VTS PF ae, 
Hint. Use polar coordinates. 


Ans. x* cosa + 2xy sin « — y? cos a = BP, where a and B@ are constants. 
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FURTHER GENERALIZATIONS 


In this chapter, we consider some further generalizations of the simplest 
variational problem. These include variational problems in spaces of dimen- 
sion greater than two (Sec. 9), problems in parametric form (Sec. 10), 
problems involving higher derivatives (Sec. 11), and problems with subsidiary 
conditions (Sec. 12). 


9. The Fixed End Point Problem for n Unknown Functions 


Let F(x, ¥1,---; Yn» Z1)---» Zn) be a function with continuous first and 
second (partial) derivatives with respect to all its arguments. Consider 
the problem of finding necessary conditions for an extremum of a functional 
of the form 


b 
ITV. + +s Yal = I F(X, Vis ++) Vas Yas ++ +9 Va) AX, (1) 


which depends on » continuously differentiable functions y,(x), ..., Ya(x) 
satisfying the boundary conditions 


ya =A, v(b)=B (= 1,...,0). (2) 
In other words, we are looking for an extremum of the functional (1) defined 
on the set of the set of smooth curves joining two fixed points in (7 + 1)- 
dimensional Euclidean space @,,,. The problem of finding geodesics, i.e., 
shortest curves joining two points of some manifold, is of this type. The 
same kind of problem arises in geometric optics, in finding the paths along 
which light rays propagate in an inhomogeneous medium. In fact, according 
to Fermat’s principle, light goes from a point Py to a point P, along the 


path for which the transit time is the smallest. 
34 
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To find necessary conditions for the functional (1) to have an extremum, 
we first calculate its variation. Suppose we replace each y,(x) by a “‘varied”’ 
function y(x) + A(x). By the variation 8J of the functional J[y1,..., yz], 
we mean the expression which is linear in A, hj (i = 1,...,2) and differs 
from the increment 


AJ = J[yy + Ay, -s Yn + Aa] — JI, -- +s Val 


by a quantity of order higher than | relative to /,, A, @ = 1,..., 2). Since 
both yx) and y,(x) + A(x) satisfy the boundary conditions (2), for each i, 
it is clear that 

h(a) = h{b) = 0 Gi = 1,...,n). 


We now use Taylor’s theorem, obtaining 
b 
AT = fF Ht ho Wi + hie) de FR YoY. dx 
do 2 
= ) » (Fy + Fyhi) dx +++, 
a i=l 


where the dots denote terms of order higher than | relative to fh, hj 
(i= 1,...,”). The last integral on the right represents the principal 
linear part of the increment AJ, and hence the variation of J[y,,..., Ya] is 


b n 
w= i) > Fuhr + Fyhd dx. 
amy cao | 


Since all the increments Ax) are independent, we can choose one of them 
quite arbitrarily (as long as the boundary conditions are satisfied), setting 
all the others equal to zero. Therefore, the necessary condition SJ = 0 for 
an extremum implies 


b 
| (Fyh + Fyhiydx =0 (i =1,...,n). 


Using Lemma 4 of Sec. 3.1, we obtain the following system of Euler 
equations : 
d 
F ue dx Fy; 
Since (3) is a system of second-order differential equations, its general 
solution contains 2 arbitrary constants, which are determined from the 
boundary conditions (2). Thus, we have proved the following 


=0 (=1,...,”). (3) 


THEOREM. A necessary condition for the curve 


ye = yi(x) (Gi = 1,...,”) 
to be an extremal of the functional 


b 
[ F(X, Vi, -- +) Yao Vio-- +s Yn) aX 


is that the functions y{x) satisfy the Euler equations (3). 
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Remark 1. We have just shown how to find a well-defined system of 
Euler equations (3) for every functional of the form (1). However, two 
different integrands F can lead to the same set of Euler equations. In fact, 


let 
® = P(x, yi, - . Vn) 


be any twice differentiable function, and let 


oo 2. o@ 

WO Vis Vn Vise oo Da) = ao t+ — Ve 4 
(5 Yt ees Ins Vis a) = Fe 2 med (4) 

Then we find at once by direct calculation that 

dy, dx\ayi) ~~” 
and hence the functionals 
b 

[ FG5 din Das Dino a) x () 


and 
b 
is [F(X Vay + +s Yn Yas = +o Yn) + PO Vis Ymr Yar Yad) ax (6) 


lead to the same system of Euler equations. 
Given any curve y, = y,(x), the function (4) is just the derivative 


d 
ax ® [x, yi(x), ones Ynlx)]. 
Therefore, the integral 


b , ; _ pe? d® 

if (x, Vis ov. +> Vn Vis - ot Yn) ax = [ dx dx 
takes the same value along all curves satisfying the boundary conditions (2). 
In other words, the functionals (5) and (6), defined on the class of functions 
satisfying (2), differ only by a constant. In particular, we can choose ® 
in such a way that this constant vanishes (but ¥ # 0). 


Remark 2. Two functionals are said to be equivalent if they have the 
same extremals. According to Remark 1, two functionals of the form (1) 
are equivalent if their integrands differ by a function of the form (4). It is 
also clear that two functionals of this form are equivalent if their integrands 
differ by a constant factor c #0. More generally, the functional (5) is 
equivalent to the functional (6) with F replaced by cF. 


Example 1. Propagation of light in an inhomogeneous medium. Suppose 
that three-dimensional space is filled with an optically inhomogeneous 
medium, such that the velocity of propagation of light at each point is some 
function v(x, y, z) of the coordinates of the point. According to Fermat’s 
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principle (see p. 34), light goes from one point to another along the curve 
for which the transit time of the light is the smallest. If the curve joining 
two points A and B is specified by the equations 


y=yx), 2 = 2(x), 


the time it takes light to traverse the curve equals 


i ES yee cae 
e = u(x, y, Z) } 


Writing the system of Euler equations for this functional, i.e., 


CA ee lee ae Saree 
oy v axyV1 + yy? 422” 
avity?+z7) d z = 
dz v? daxypV/l + yy? 422 


we obtain the differential equations for the curves along which the light 
propagates. 


Example 2. Geodesics. Suppose we have a surface o specified by a vector 
equation? 


r = ry, v). (7) 


The shortest curve lying on o and connecting two points of a is called the 
geodesic connecting the two points. Clearly, the equations for the geodesics 
of o are the Euler equations of the corresponding variational problem, i.e., 
the problem of finding the minimum distance (measured along o) between 
two points of a. 

A curve lying on the surface (7) can be specified by the equations 


u= u(t), v = v(t). 


The arc length between the points corresponding to the values rt, and ft, 
of the parameter ¢ equals 


t ———— SS 
Jlu, v) = |° WEW? + 2Fu'v’ + Gv" dt, (8) 
to 


where E, F and G are the coefficients of the first fundamental (quadratic) 
form of the surface (7), i-e.,? 


E=r,",, F=r,r,, G =r,,-F,. 


1 Here, vectors are indicated by boldface letters, and a-b denotes the scalar product 
of the vectors a and b. 
2 See D. V. Widder, op. cit., p. 110. 
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Writing the Euler equations for the functional (8), we obtain 


E,u? + 2Fyw'v' + Gw?  d 2(Eu' + Fv’) 0 
VEu? + 2Fu'v’ + Gv? 9 dt VEu? + 2Fu'v' + Gu? , 
Eu? + 2Fwu'o' + Gu? dF + Gu’) 


VEu? + 2Fu'+ Gv? 4 VEuv? + 2Fuv’ + Gv? 


As a very simple illustration of these considerations, we now find the 
geodesics of the circular cylinder 


r = (acos 9, asin 9, 2), (9) 


where the variables p and z play the role of the parameters u and v. Since 
the coefficients of the first fundamental form of the cylinder (9) are 


E = a’, F=0, G=1, 
the geodesics of the cylinder have the equations 
OS Se a ee ee ee 0 
dt Jap? 4+ 7% 9 dt Vagtg? 4772’ 
oe: ae C. ate Soe C 
Dividing the second of these equations by the first, we obtain 


dz _ 
do = ¢1, 
which has the solution 
Z= (9+ Ca, 


representing a two-parameter family of helical lines lying on the cylinder (9). 

The concept of a geodesic can be defined not only for surfaces, but also 
for higher-dimensional manifolds. Clearly, finding the geodesics of an 
n-dimensional manifold reduces to solving a variational problem for a 
functional depending on n functions. 


10. Variational Problems in Parametric Form 


So far, we have considered functionals of curves given by explicit equations, 
e.g., by equations of the form 


y = W(x) (10) 


in the two-dimensional case. However, it is often more convenient to 
consider functionals of curves given in parametric form, and in fact we have 
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already encountered this case in Example 2 of the preceding section (involving 
geodesics on a surface). Moreover, in problems involving closed curves 
(like the isoperimetric problem mentioned on p. 3), it is usually impossible 
to get along without representing the curves in parametric form. Thus, 
in this section, we extend our previous results to the case where the curves 
are given parametrically, confining ourselves to the simplest variational 
problem. 
Suppose that in the functional 


[ FG: yy) dx, (1) 


we wish to regard the argument y as a curve which is given in parametric 
form, rather than in the form (10). Then (11) can be written as 


f F [x(, xt), 22] x at = [ O(x, y, X, 9) at (12) 
to x(t) to 
(where the overdot denotes differentiation with respect to ¢), ie., as a 
functional depending on two unknown functions x(t) and y(t). The 
function ® appearing in the right-hand side of (12) does not involve ¢ 
explicitly, and is positive-homogeneous of degree I in X(t) and y(t), which 
means that 
D(x, y, AX, AY) = AD(x, y, X, J) (13) 

for every 4 > 0.3 

Conversely, let 


f° OG, », % Wat 
to 


be a functional whose integrand ® does not involve ¢ explicitly and is positive- 

homogeneous of degree 1 in X and y. We now show that the value of such 

a functional depends only on the curve in the xy-plane defined by the para- 

metric equations x = x(t), y = y(t), and not on the functions x(t), y(t) 

themselves, i.e., that if we go from ¢ to some new parameter t by setting 
t= 1(t), 

where dt/dt > 0 and the interval [fo, ,] goes into [to, t,], then 


y dx dy _ fh er 
i (x, ne 2) dx = [ * O(x, »5 5% 9) dt 


3 The example of the arc-length functional 
t ————— 
i) 1 V3? +52 dt, 
to 


whose value does not depend on the direction in which the curve x = x(t), y = y(t) is 
traversed, shows why (13) does not hold for 4 < 0. 
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In fact, since ® is positive-homogeneous of degree 1 in x and jy, it follows 
that 


vy dx dy _ oft .dt .dt 
[2 © (x2) de = [O(n 2 BIZ) de 
=(" (x, y, x yta=[" D(x, y, x, j) dt 
to 3 , ’ at to , , Lg > 


as asserted. Thus, we have proved the following 


THEOREM. A necessary and sufficient condition for the functional 
fy Song 
) D(t, x, y, X, 9) dt 
‘0 


to depend only on the curve in the xy-plane defined by the parametric 

equations x = x(t), y = y(t) and not on the choice of the parametric 

representation of the curve, is that the integrand ® should not involve 

t explicitly and should be a positive-homogeneous function of degree \ in 

xX and y. 

Now, suppose some parameterization of the curve y = y(x) reduces the 
functional (11) to the form 


ty 4) 2 = ty ee 
i. F(x, v2) eat bs D(x, y, %, J) dt. (14) 


The variational problem for the right-hand side of (14) leads to the pair of 
Euler equations 


d d 
®, — 792 =9% ®, — 59s = 9, (15) 
which must be equivalent to the single Euler equation 
d 
F,— 5 Fy =, 


corresponding to the variational problem for the original functional (11). 
Hence, the equations (15) cannot be independent, and in fact it is easily 
verified that they are connected by the identity 


d d 
+(0, - $%) +5(0, - 59) ap: (16) 


We shall discuss this point further in Sec. 37.5. 


Il. Functionals Depending on Higher-Order Derivatives 


So far, we have considered functionals of the form 


b 
[ F(x, y, y’) dx, 
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depending on the function (x) and its first derivative y’(x), or of the more 
general form 


b 
[ F(X, Vis +++ Yn Vis - re 9) dx, 


depending on several functions y,(x) and their first derivatives y(x). How- 
ever, many problems (e.g., in the theory of elasticity) involve functionals 
whose integrands contain not only y,(x) and y,(x), but also higher-order 
derivatives y;(x), y7(x),... The method given above for finding extrema 
of functionals (in the context of necessary conditions for weak extrema) can 
be carried over to this more general case without essential changes. For sim- 
plicity, we confine ourselves to the case of a single unknown function y(x). 

Thus, let F(x, y, 21,..., Z,) be a function with continuous first and second 
(partial) derivatives with respect to all its arguments, and consider a 
functional of the form 


Jl = PFO 9-5) ae (17) 


Then we pose the following problem: Among all functions y(x) belonging to 
the space 2,(a, b) and satisfying the conditions 


ya) = Ao, y'(a) = Ay, vas y®" (a) = Ay-a, (18) 
yb) = Bo, y'(b) = By, ..., y"-(b) = By, 


find the function for which (17) has an extremum. To solve this problem, we 
start from the general result which states that a necessary condition for a 
functional J[y] to have an extremum is that its variation vanish (Theorem 2, 
p. 13). Thus, suppose we replace y(x) by the “varied” function (x) + A(x), 
where A(x), like (x), belongs to Y,(a, 6). By the variation 8J of the 
functional J[y], we mean the expression which is linear in h, h’,..., A™, 
and which differs from the increment 


AJ = J[y + A] - Jy] 


by a quantity of order higher than | relative to h, h’,..., A. Since both 
y(x) and y(x) + A(x) satisfy the boundary conditions (18), it is clear that 


h(a) = hi(a) = «++ = A"-%(a) = 0, 
h(b) = h'(b) = +++ = h™-%(b) = 0. (19) 


Next, we use Taylor’s theorem, obtaining 
b 
AT = [ [Fay + hy + Wy +h) — FO Y's YO] 
5 
= [ (FA + Fyhl +++) Fyoh) det --, 


* The increment A{x) is often called the variation of y(x). In problems involving 
“fixed end point conditions” like (18), we often write A(x) = d5y(x). 
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where the dots denote terms of order higher than | relative to A, h’,..., A™. 
The last integral on the right represents the principal linear part of the 
increment AJ, and hence the variation of J[y] is 


b 
BJ = [ (Fh + Fy hl +--+ + Fy) dee. 
Therefore, the necessary condition 8J = 0 for an extremum implies that 
b 
(Fh + Fyh! +--+ + Fyoh™) dx = 0. (20) 


Repeatedly integrating (20) by parts and using the boundary conditions (19), 
we find that 


> d d? _ 
[. [- — Fy + Saye — + (=I 55 Km] AQ) dx =0 21) 


for any function h which has n continuous derivatives and satisfies (19). It 
follows from an obvious generalization of Lemma | of Sec. 3.1 that 


12 d™ 
F,- > aa Fe — + + (- I Fa Fm = 0, (22) 
a result again called Euler’s equation. Since (22) is a differential equation 
of order 2n, its general solution contains 2n arbitrary constants, which can 
be determined from the boundary conditions (18). 


Remark. This derivation of the Euler equation (22) is not completely 
rigorous, since the transition from (20) to (21) presupposes the existence 
of the derivatives 

d d? d”" 

ax Fy, ae Fy, eaey ax Fy. (23) 
However, by a somewhat more elaborate argument, it can be shown that 
(20) implies (22) without this additional hypothesis. In fact, the argument 
in question proves the existence of the derivatives (23), as in Lemma 4 of 
Sec. 3.1.5 


12. Variational Problems with Subsidiary Conditions 


12.1. The isoperimetric problem. In the simplest variational problem 
considered in Chapter |, the class of admissible curves was specified (apart 
from certain smoothness requirements) by conditions imposed on the end 
points of the curves. However, many applications of the calculus of varia- 
tions lead to problems in which not only boundary conditions, but also 


> Of course, this argument is unnecessary if it is known in advance that F has contin- 
uous partial derivatives up to order 7 + 1 (with respect to all its arguments). 
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conditions of quite a different type known as subsidiary conditions (synony- 
mously, side conditions or constraints) are imposed on the admissible curves. 
As an example, we first consider the isoperimetric problem,® which can be 
stated as follows: Find the curve y = y(x) for which the functional 


b 
Jb = | Fey, »”) dx (24) 
has an extremum, where the admissible curves satisfy the boundary conditions 


ya)= A, yWb)= 8, 


and are such that another functional 


Kb = [Ges yy) ae (25) 


takes a fixed value |. 

To solve this problem, we assume that the functions F and G defining 
the functionals (24) and (25) have continuous first and second derivatives in 
[a, b) for arbitrary values of y and y’. Then we have 


THEOREM 1.7 Given the functional 


Jol = f° FG», ») ds 


let the admissible curves satisfy the conditions 
b 
wa)= A,  y(6)=B,  Kl=J Geyy)dx=1 (26) 


where K[y] is another functional, and let J{y] have an extremum for 
y = Wx). Then, if y = y(x) is not an extremal of Ky], there exists a 
constant ) such that y = y(x) is an extremal of the functional 


i (F + 2G) dx, 


i.e., y = WX) satisfies the differential equation 


Fy- EF t (GF 


£ Gy) =0: (27) 


Proof. Let J[y] have an extremum for the curve y = (x), subject to 
the conditions (26). We choose two points x, and xz in the interval 


® Originally, the isoperimetric problem referred to the following special problem 
(already mentioned on p. 3): Among all closed curves of a given length I, find the curve 
enclosing the greatest area. This explains the designation ‘‘isoperimetric” = ‘‘ with the 
same perimeter.” 

7 The reader will easily recognize the analogy between this theorem and the familiar 
method of Lagrange multipliers for finding extrema of functions of several variables, 
subject to subsidiary conditions. See e.g., D. V. Widder, op. cit., Chap. 4, Sec. 5, espe- 
cially Theorem 5. 
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[a, 6], where x, is arbitrary and x, satisfies a condition to be stated below, 
but is otherwise arbitrary. Then we give y(x) an increment 8, y(x) 
+ 52)(x), where 3,)(x) is nonzero only in a neighborhood of x,, and 
32)(x) is nonzero only in a neighborhood of x,. (Concerning this 
notation, see footnote 4, p. 41.) Using variational derivatives, we can 
write the corresponding increment AJ of the functional J in the form 

dF 


AJ = yy 


oF 
+ ei} Ao, + wy 


+ ex} Acs, (28) 


rar, raz 


where 
b b 
Ao, = i 31, (x) dx, Ac, = i) 52 y(x) dx 


and €,, €2 > 0 as Ao,, Ac, — 0 (see the Remark on p. 29). 
We now require that the “varied” curve 


y= y*(x) = WX) + 1x) + 82x) 
satisfy the condition 


K[y*] = K[>]. 
Writing AK in a form similar to (28), we obtain 
8G 
AK = K[y*] — Kly] = 455 
y zr=2) (29) 


, 8G 
+ &) Ao, + {Fe 


+ es} Acs = 0, 


z=2o 


where €}, €2 > 0 as Ao,,Ac,—>0. Next, we choose x, to be a point for 
which 


8G 
dy 


# 0. 

ror 

Such a point exists, since by hypothesis y = y(x) is not an extremal of 
the functional K. With this choice of x2, we can write the condition 
(29) in the form 

8G 
by 
8G 
by 


r=2, 


Ao, = — 


+ e')Ao,, (30) 


r=Iq 


where e’-> 0 as Ao, > 0. Setting 


SF 
oy t=%2 
~ 86 
oy 


’ 


A= 


zr=72 
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and substituting (30) into the formula (28) for AJ, we obtain 


oF 


AJ = by 


Peel 
a by 


} ao, + eAa,, (31) 
zr=z) 

where « > 0 as Ao, ->0. This expression for AJ explicitly involves 
variational derivatives only at x = x,, and the increment A(x) is now 
just 8, y(x), since the ‘compensating increment” 8.y(x) has been taken 
into account automatically by using the condition AK = 0. Thus, the 
first term in the right-hand side of (31) is the principal linear part of AJ, 
i.e., the variation of the functional J at the point x, is 


SJ = {Fe 2 2 ee 


Since a necessary condition for an extremum is that 5J = 0, and since 
Ao, is nonzero while x, is arbitrary, we finally have 

oF 8G d d 

Bt ge giv ti(G- ze) = 0, 
which is precisely equation (27). This completes the proof of the 
theorem. 


+326 
duke dy 


To use Theorem | to solve a given isoperimetric problem, we first write 
the general solution of (27), which will contain two arbitrary constants in 
addition to the parameter A. We then determine these three quantities from 
the boundary conditions y(a) = A, y(b) = B and the subsidiary condition 
K{y] = 1. 

Everything just said generalizes immediately to the case of functionals 
depending on several functions y,,..., y, and subject to several subsidiary 
conditions of the form (25). In fact, suppose we are looking for an extremum 
of the functional 


Is Ind = FP FG6 tyes Yor Die eos Ya) (32) 
subject to the conditions 
y(@) =A, y(b)=B (Gi =1,..47) (33) 
and 
[Ger rere Wen Gah 4) 


In this case a necessary condition for an extremum is that 


mlF+ DG) - {8 a(F+ 3 ¥G,)} =o (i =1,...,n). (35) 


The 2n arbitrary constants appearing in the solution of the system (35), 
and the values of the & parameters 2,,..., A,, sometimes called Lagrange 
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multipliers, are determined from the boundary conditions (33) and the 
subsidiary conditions (34). The proof of (35) is not essentially different 
from the proof of Theorem 1, and will not be given here. 


12.2. Finite subsidiary conditions. In the isoperimetric problem, the 
subsidiary conditions which must be satisfied by the functions y,,..., y, 
are of the form (34), i.e., they are specified by functionals. We now consider 
a problem of a different type, which can be stated as follows: Find the 
functions yx) for which the functional (32) has an extremum, where the 
admissible functions satisfy the boundary conditions 


ya) = A, y(b) = B, (= 1,...,7) 
and k “finite” subsidiary conditions (k < n) 


8%, Yry-- Yn) =O (f= ],..., 4). (36) 
In other words, the functional (32) is not considered for all curves satisfying 
the boundary conditions (33), but only for those which lie in the (n — k)- 
dimensional manifold defined by the system (36). 
For simplicity, we confine ourselves to the case n = 2,k = 1. Then we 
have 


THEOREM 2. Given the fictional 


t) 
Jtyzl= [Fa y2y, 2) dx, (37) 
let the admissible curves lie on the surface 
&(% ¥, z) = 0 (38) 
and satisfy the boundary conditions 
a) = Aj, yb) = By, (39) 


2(a)= Az, =. (6) = Ba, 
and moreover, let J[y] have an extremum for the curve 

y=Wx) =z = 2(x). (40) 
Then, if g, and g,do not vanish simultaneously at any point of the surface 


(38), there exists a function (x) such that (40) is an extremal of the 
Sunctional 


fF + 20081 dx, 


i.e., Satisfies the differential equations 


d 
Fy + dg, — a = 0, 
(41) 


Fi + Ma: - £ Fe = 0. 


Proof. As might be expected, the proof of this theorem closely 
resembles that of Theorem 1. Let J[y, z] have an extremum for the 
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curve (40), subject to the conditions (38) and (39), and let x, be an arbi- 
trary point of the interval [a, 5]. Then we give y(x) an increment Sy(x) 
and z(x) an increment 8z(x), where both Sy(x) and 8z(x) are nonzero 
only in a neighborhood [a, 8] of x;. Using variational derivatives, we 
can write the corresponding increment AJ of the functional J[y, z] in the 


form 
+ ci} As, + {| n ea} Ac, (42 
r=2) Z)r=27, 


oF 
a= {el 


bd b 
Ac, = i} Spx) dx, Acs = f 82(x) dx, 


where 


and €,, €2 > 0 as Ao,, Ac, > 0. 
We now require that the “varied” curve 


y = y*(x) = v(x) + 8x), oz = z*(x) = 2(x) + Bz(x) 
satisfy the condition ® 
B(x, y*, z*) = 0. 


In view of (38), this means that 


b b 
0 = |. [ex y* 24) — ate y, 2) dx = | (6, By + 8.32) dx 


(43) 
= (ey leax; + ei} Ao, + Balewe: + €} Acs, 

where ¢€;, €2 > 0 as Ao,, Ac2 > 0, and the overbar indicates that the 

corresponding derivatives are evaluated along certain intermediate curves. 

By hypothesis, either g,|,-:, OF 8z|:=:, is nonzero. If g.|,-,, # 0, we 

can write the condition (43) in the form 


Ao, = — ‘haa te }Ao,, (44) 


where e’>0 as Ac, >0. Substituting (44) into the formula (42) for 


AJ, we obtain 
- (#5) 
Ir=2) 8g, 8z r=z) 


® The existence of admissible curves y = y*(x), z = z*(x) close to the original curve 
y = y(x), z = 2(x) follows from the implicit function theorem, which goes as follows: 
If the equation g(x, y, z) = 0 has asolution for x = xo, y = yo, Z = 2o, if g(x, y, 2) and its 
first derivatives are continuous in a neighborhood of (x0, yo, Zo), and if gXo, Yo, Zo.) # 0, 
then g(x, y, z) = 0 defines a unique function z(x, y) which is continuous and differ- 
entiable with respect to x and y in a neighborhood of (xo, yo) and satisfies the condition 
z(Xo, ¥o) = Zo. [There is an exactly analogous theorem for the case where 
&y(Xo0, Yo, 20.) # 0.) Thus, if g.{x, y(x), z(x)] # 0 in a neighborhood of the point xo, 
we can change the curve y = y(x) to y = y*(x) in this neighborhood and then determine 
z*(x) from the relation z*(x) = z[x, y*(x)). 


oF 


AJ = By 


bdo, + eAo,, 
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where ¢ > 0 as Ac,->0. The first term in the right-hand side is the 
principal linear part of AJ, i.e., the variation of the functional J at the 


point x, is 
8, 8F 
r=r, 7 (2 5) poe 


Since a necessary condition for an extremum is that 8/ = 0, and since 
Ao, is nonzero while x, is arbitrary, we finally have 


oF 
oJ = y 


SF 8, 8F _ d 8y d - 
pine ae neg) 
or 
rj-4K, F, 4 Fr, 
pO ee (45) 
By 8z 


Along the curve y = y(x), z = 2(x), the common value of the ratios 
(45) is some function of x. If we denote this function by —A(x), then 
(45) reduces to precisely the system (41). This completes the proof of 
the theorem. 


Remark 1. We note without proof that Theorem 2 remains valid when 
the class of admissible curves consists of smooth space curves satisfying the 
differential equation® 


g(x, V2, y', Zz) =0. (46) 


More precisely, if the functional J has an extremum for a curve y, subject 
to the condition (46), and if the derivatives g,-, g. do not vanish simul- 
taneously along y, then there exists a function A(x) such that y is an integral 
curve of the system 


where 
® = F+ 2G. 


Remark 2. In acertain sense, we can consider a variational problem with 
a finite subsidiary condition to be a limiting case of an isoperimetric problem. 
In fact, if we assume that the condition (38) does not hold everywhere, but 
only at some fixed point 


g(%1, y, Zz) = 0, 


we obtain a condition whose left-hand side can be regarded as a functional 
of y and z, i.e., a condition of the type appearing in the isoperimetric problem. 


° In mechanics, conditions like (46), which contain derivatives, are called nonholonomic 
constraints, and conditions like (38) are called holonomic constraints. 
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Thus, the condition (38) can be regarded as an infinite set of conditions, 
each of which is a functional. As we have seen, in the isoperimetric problem 
the number of Lagrange multipliers 4,,..., A, equals the number of con- 
ditions of constraint, In the same way, the function A(x) appearing in the 
problem with a finite subsidiary condition can be interpreted as a ‘‘ Lagrange 
multiplier for each point x.” 


Example 1. Among all curves of length | in the upper half-plane passing 
through the points (—a,0) and (a, 0), find the one which together with the 
interval [—a, a] encloses the largest area. We are looking for the function 
y = »(x) for which the integral 


JD] = Joya 
takes the largest value subject to the conditions 
X-a) = a) =0, Kb) = fi) VIF? dx = 1. 


Thus, we are dealing with an isoperimetric problem. Using Theorem 1, 
we form the functional 


JD] + KDI = f° + VT Fy ax, 


and write the corresponding Euler equation 


qd_sy 
1+a>—- = 0, 
dx V1 + y 
which implies 
y’ 
ay ee ee ore 47 
V+ ‘yi4 ss (47) 


Integrating (47), we obtain the equation 
(x — Cy)? + (vy — C,)? = 2? 


of a family of circles. The values of C,, C, and A are then determined from 
the conditions 
W-a)=ya=0, Kyl =. 

Example 2. Among all curves lying on the sphere x? + y? + z? = a? and 
passing through two given points (Xo, Yo, Zo) and (X1, Yi, 21), find the one which 
has the least length. The length of the curve y = y(x), z = 2(x) is given by 
the integral 


i V1 + y? + 2? dx. 
Zo 


Using Theorem 2, we form the auxiliary functional 


[2 VTE FREE + GMa? + 9? + DM ds, 


50 FURTHER GENERALIZATIONS CHAP. 2 


and write the corresponding Euler equations 


, 


d y 
VO ere yee 
aen(x) — 4 a = 0. 


dx V/1 + y? 4 2% 


Solving these equations, we obtain a family of curves depending on four 
constants, whose values are determined from the boundary conditions 


W(%o) = Yo, YO.) = Ya 
2(Xo) = Zo, 2(%1) = 21. 


Remark. As is familiar from elementary analysis, in finding an extremum 
of a function of n variables subject to k constraints (kK < nm), we can use the 
constraints to express k variables in terms of the other n — k variables. 
In this way, the problem is reduced to that of finding an unconstrained 
extremum of a function of n — k variables, i.e., an extremum subject to no 
subsidiary conditions. The situation is the same in the calculus of variations. 
For example, the problem of finding geodesics on a given surface can be 
regarded as a problem subject to a constraint, as in Example 2 of this section. 
On the other hand, if we express the coordinates x, y and z as functions of 
two parameters, we can reduce the problem to that of finding an unconstrained 
extremum, as in Example 2 of Sec. 9. 


PROBLEMS 
1. Find the extremals of the functional 
{2 
Jin 21 = [ (+ 2? + 22) de, 


subject to the boundary conditions 
x0)= 0, Wr/2)=1, 200)=0, x(n/2) = 1. 


2. Find the extremals of the fixed end point problems corresponding to the 
following functionals: 


a) IP (y? + 2 + y'z’) dx; 
b) [7 @yz — 298 + y? ~ 2°) de, 
3. Find the extremals of a functional of the form 
[2 FU, 2 de, 
given that F,-yF.-.- — (Fy2)? # 0 for x9 < x < x. 


Ans. A family of straight lines in three dimensions. 
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4. State and prove the generalization of Theorem 3 of Sec. 4.1 for functionals 
of the form 


b 
i F(x, Vi es Yn Vu sees Yn) dx. 
Hint. Thecondition F,, # Ois replaced by the condition det || Fyiy;,| 4 0. 


5. What is the condition for a functional of the form 


ty F . 
ih F(t, Yas ++) Yay Y1y ++ + Yn) At, 


depending on an n-dimensional curve » = y(x), i= 1,...,, to be 
independent of the parameterization? 


6. Generalizing the definition of Sec. 10, we say that the function f(m,..., Xn) 
iS positive-homogeneous of degree k in X1,..., Xn if 


SOx... 25 Xn) = MF (rr, .-5 Xn) 


for every 4 > 0. Prove the following result, known as Euler’s theorem: 
If f(x;,..., Xn) is continuously differentiable and positive-homogeneous of 
degree «, then 


7. State and prove the converse of Euler's theorem. 
8. Verify formula (16) of Sec. (10). 
Hint. Use Euler’s theorem. 


9. Prove that the Euler equations (15) of the variational problem in para- 
metric form can be written as 


Pry — Dry + (X¥ — XP). = 0, (a) 


where ®, is a positive-homogeneous function of degree —3 satisfying the 
relations 
Oy = yO, Or, = —xy®,, yy = X7®. 


Comment. Equation (a) is known as Weierstrass form of the Euler 
equations. It can also be written as 


1_ _ ® -%, 
p OG + yp? 


where p is the radius of curvature of the extremal. 


10. Prove that Weierstrass’ form of the Euler equations is invariant under 
parameter changes ¢ = t(t), dt/dz > 0. 


11. Find the extremals of the functional 
1 
l= Joa +9) da, 


subject to the boundary conditions 
xO=0 yYO=Hl wO=H1, YOD=l. 
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12. Find the extremals of the functional 


m2 
Jl = [Ort = 7 + x4) dx, 
subject to the boundary conditions 


yO=1, yO)=0,  yR/2)=0, y(n) = 1. 
13. Show that the Euler equation of the functional 


ty 2 
i F(x, y, y’, ¥”) dx 


has the first integral 
a 
dx 


if the integrand does not depend on y, and the first integral 


Fy - Fy» = const 


F- y (Fy - 4 r,) — y’F,” = const 
if the integrand does not depend on x. 
14. Find the curve joining the points (0, 0) and (1, 0) for which the integral 
nL 
| y”? dx 
oO 
iS a minimum if 
a) y'(0) = a, y'(1) = 8; 
b) No other conditions are prescribed. 
15. Supply the details of the argument mentioned in the remark on p. 42. 


16. By direct calculation, without recourse to variational methods, prove 
that the isosceles triangle has the greatest area among all triangles with a 
given base line and a given perimeter. 


Hint. All the triangles in question have the given base line and a vertex 
lying on a certain ellipse. 


17. Find the equilibrium position of a heavy flexible inextensible cord of 
length /, fastened at its ends. 


Hint. Minimize the ordinate of the center of gravity of the cord. By 
making a suitable change of variables, reduce the problem to Example 2 of 
Sec. 4.2. 


18. Find the extremals of the functional 
b= [0 + 2) dx, 
subject to the conditions 
yO) =0, wt=0,  [ yedx = 2. 


19. Suppose an airplane with fixed air speed vo makes a flight lasting T 
seconds. Along what closed curve should it fly if this curve is to enclose 
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the greatest area? It is assumed that the wind velocity has constant direction 
and magnitude a < vo. 

Ans. An ellipse whose major axis is perpendicular to the wind velocity 
and whose eccentricity is a/vo. The velocity of the airplane is perpendicular 
to the radius vector of the ellipse. 


20. Given two points A and B in the xy-plane, let y be a fixed curve joining 
them. Among all curves of length / joining A and B, find the curve which 
together with y encloses the greatest area. 


21. Generalizing the preceding problem, suppose the xy-plane is covered by 
a mass distribution with continuous density u(x, y). As before, let A and B 
be two points in the plane, and let y be a fixed curve joining them. Among 
all curves of length / joining A and B, find the curve which together with y 
bounds the region of greatest mass. 


Hint. Introduce the auxiliary function V(x, y) = J u(x, y) dx. Then use 
Green’s theorem and Weierstrass’ form of the Euler equations. 


22. Among all curves joining a given point (0, 6) on the y-axis to a point on 
the x-axis and enclosing a given area S together with the x-axis, find the curve 
which generates the least area when rotated about the x-axis. 

Ans. The line 


where ab = 28. 


3 


THE GENERAL VARIATION 
OF A FUNCTIONAL 


13. Derivation of the Basic Formula 


In this section, we derive the general formula for the variation of 
a functional of the form 


J[y1,-- +> Val = Is ECG, Vie: sass Vas Vig s+ 5 Va) OX (1) 
rt) 


beginning with the case where (1) depends on a single function y and hence 
reduces to 


J) = J* FG, yy) ax (2) 


We assume that all admissible curves are smooth, but, departing from our 
previous hypothesis, we assume that the end points of the curves for which 
(2) is defined can move in an arbitrary way. By the distance between two 
curves y = y(x) and y = y*(x) is meant the quantity 


o(y, ¥*) = max |y — y*| + max |y’ — y*’| + p(Po, Ps) + (Pi, PH), (3) 


where Po, P* denote the left-hand end points of the curves y = )(x), 
y = y*(x), respectively, and P,, P* denote their right-hand end points.? 
In general, the functions y and )* are defined on different intervals J and /*. 
Thus, in order for (3) to make sense, we have to extend y and y* onto some 
interval containing both Jand I[*. For example, this can be done by drawing 
tangents to the curves at their end points, as shown in Figure 4. 


1 In the right-hand side of (3), p denotes the ordinary Euclidean distance. 
54 
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Now let y = y(x) and y = y*(x) be two neighboring curves, in the sense 
of the distance (3), and let? 
A(x) = y*(x) — y(X). 


Po = (Xo; Yo), Py = (%1, yi) 
denote the end points of the curve y = y(x), while the end points of the 
curve y = y*(x) = y(x) + A(x) are denoted by 


PE = (Xo + 8x0, Yo + S¥o), = PF = (41 + 8x1, v1 + 81). 


Moreover, let 


FiGure 4 


The corresponding variation SJ of the functional J[y] is defined as the 
expression which is linear in A, h’, 8x9, 5yo, 5x;, 8¥1, and which differs from 
the increment 

AJ = J[y + h] - Jy] 


by a quantity of order higher than | relative to p(y, y + A). Since?® 


Z, +67, ery 
AJ = , ‘ F(x,y + hy’ +h) dx — | F(x, y, y’) dx 
Zo +520 Zo 
=| Gy + hy +h) — Fea yy de (4) 


+ ie F(xy thy +h)dx- le 
To 


71 


629 
Fixy,y+hy +h’) dx, 


it follows by using Taylor’s theorem and letting the symbol ~ denote equality 
except for terms of order higher than | relative to p(y, y + A) that 


AJ ~ | LFsGe ys yh + Byes ys DMT dx 
+ F(X, ¥, YYlze2 8X1 — FO Y, Y)|2=20 8X0 
-{" [Ff _ afr] h(x) dx + Flrar, 3x1 + Fyhlenr, 
— Flensy 8% — Fyhleasos 


2 Note that it is no longer appropriate to write A(x) = 5y(x), as in footnote 4, p. 41. 
In fact, in the more precise notation of Sec. 37, A(x) = 5y(x). 

3 Recall that we have agreed to extend »(x) and y*(x) linearly onto the interval 
[Xo, X1 + 5x,], so that all integrals in the right-hand side of (4) are meaningful. 
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where the term containing /’ has been integrated by parts. However, it is 
clear from Figure 4 that 


h(x0) ~ 8¥o — ¥'(Xo) 8X0, 
h(xy) ~ 8y1 — y's) 81, 


where ~ has the same meaning as before, and hence 


sJ = {- [- - aa h(x) dx + Fy|z=2, 89V1 + (F - Fyy)\r=2, 8%1 
To 


‘ (5) 
= Filsaz 8¥o a (F _ Fyy')|2=25 8x05 


or more concisely, 


vf le-ge]noss nate nnnfs 
= I=I9 = 


=I9 


where we define 
dx|,-,, = 8x, Sy|2-2, = 3% (i = 0, 1). 


This is the basic formula for the general variation of the functional J[y]. 
If the end points of the admissible curves are constrained to lie on the straight 
lines x = Xo, X = X,, as in the simple variable end point problem considered 
in Sec. 6, then xo = 5x, = 0, while, in the case of the fixed end point 
problem, 8x, = 8x, = Oand Sy) = Sy, = 0. 

Next, we return to the more general functional (1), which depends on 
n functions y,,...,; Ya. Since any system of m functions can be interpreted 
as a curve in (» + 1)-dimensional Euclidean space &,,,, we can regard (1) 
as defined on some set of curves in &,,,. Paralleling the treatment just 
given for n = 1, we now calculate the variation of the functional (1) when 
there are no restrictions on the end points of the admissible curves. As 
before, we write 


h(x) = yFQ) — v(x) = 1...) 
where for each /, the function y*(x) is close to y,(x) in the sense of the distance 
(3). Moreover, we let 
Po = (Xoy Y2n- ++ VR)s Py = (1, is. - +s Va) 
denote the end points of the curve y, = yx), i = 1,...,, while the end 
points of the curve y, = y#(x) = y(x) + A(x), i = 1,..., 7”, are denoted by 


PH = (Xo + 8x0, y? + Syf,.--, yn + Syn), 
P# = (x1 + 3x1, yi + Syi,..-, yk + SYA), 


and once more, we extend the functions y,(x) and y#(x) linearly onto the 
interval [xo, x, + 5x,]. The corresponding variation 8J of the functional 
J[yi,-.-+»}n] is defined as the expression which is linear in 5x9, 5x, and all 
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the quantities /,, h;, v3, Sy} (i = 1,...,), and which differs from the 
increment 
AJ = J[y1 + Ay. 2-3 ¥n + Aa) — JI. Val 


by a quantity of order higher than | relative to 


Since 


z, +62, A . zy ; 
AJ = [ FQ. et hy Ht My )dx = "FG. Io Yo Dede 


9 + 529 
= [DFG MHF Mise) = FQ Yo Yo Mex 
2 z, +62, ; ; 
tf FG. + oH + H--) 
Zo + 61g - 
~ { F(x... + hg Ye +My...) ax, 
Zo 


it follows by using Taylor’s theorem and letting the symbol ~ denote 
equality except for terms of order higher than | relative to the quantity (6) 
that 


wo i 


Zo 
am nae dXo ar > FyAy|2z= 29 
{= 


where the terms containing h; have been integrated by parts. Just as in the 
case n = 1, we have 


Fy A, + F,h) dx + Flee 8x, a Fleas 8X 


1 Ne M 


Sa — 5 Fy) ha) dx + Flan rn, OX, + 2 Fy lsergs 


h(xo) ~ 8y4 — yilxo) 8X0, 
h(x) ~ 8y} — yilry) 8x1, 


and hence 
= [" > (4, ~a Fy) hx) dx 
To j=1 
+ > Fy, dyi + (F - > iF) dx, 
i=1 r=2 i=1 z=2) 
- DA w-(F- DwFu) | deo 
i=l r=I0 i=l r=20 
or more concisely, 
oJ = [. 2 (F uo Wy in) h(x) dx 
+ 2 fi By * + (F- > ¥Fx) dx wae (7) 
to i=1 r=I9 


where, as before, we define 
8x25, > 3x;, dylr=3, rs dy} G = 0, 1). 
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This is the basic formula for the general variation of the functional 
F [iscescs Wal 

We now write an even more concise formula for the variation (7), at 
the same time introducing some important new ideas, to be discussed in 
more detail in the next chapter. Let 


Di = Fy, G =1,...,n), (8) 
and suppose that the Jacobian 

Apr, .--» Pr) 

— = det || F,.,. 

OV, - - +> Vn) I vive 
is nonzero.*’ Then we can solve the equations (8) for yj, ..., y;, as functions 
of the variables 

Xy Vry- +09 ny Pris + +> Pn (9) 


Next, we express the function F(x, ¥3,..-; Yao Yiy---» Yn) appearing in (1) 
in terms of a new function H(x, y1,..-., Ya» Piy--+; Pa) related to F by the 
formula 


H=-—-F+ > Fi = > vip. 
f=1 i=1 


where the y, are regarded as functions of the variables (9). The function 
H is called the Hamiltonian (function) corresponding to the functional 
J[y1,.--; Yn]. In this way, we can make a local transformation (see footnote 
2, p. 68) from the “variables” x, yi,..., Yn» Yis--+>¥n appearing in (1) 
to the new quantities x, y,,...,;¥ny»Pis--+>DPn» H, called the canonical 
variables (corresponding to the functional J[y,,..., ¥,]). In terms of the can- 
onical variables, we can write (7) in the form 


_ aX = ap, Lid = z=2, 
sJ = : 2 (-, i) h(x) dx + (> psy, - H Bx) dae 
Remark. Suppose the functional J[y,,..., y¥,] has an extremum (in a 
certain class of admissible curves) for some curve 
Yi = yilx) (i = 1,...,n) (10) 


joining the points 


Po = (Xo, Y2,- + +s Vas Py = (%1, Yin. + +s Yad 
Then, since J[);, ..., ¥,] has an extremum for (10) compared toall admissible 
curves, it certainly has an extremum for (10) compared to all curves with 
fixed end points Po and P;. Therefore, (10) is an extremal, i.e., a solution 
of the Euler equations 


—F,=0 (=1,...,2) 


* By det |la,.|| is meant the determinant of the matrix ||a,,||. 
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so that the integral in (7) vanishes, and we are left with the formula 


$J = [> F,; by, + (F _ > HF) Bx ; - (11) 
f=1 i=1 r=I9 
or in canonical variables 
BJ = ( > p89, — Hx) ae (12) 
i=1 z=20 


Thus, regardless of the boundary conditions defining our variable end point 
problem, the curve for which J[y,,..., y,] has an extremum must first be 
an extremal and then satisfy the condition that (11) or (12) vanish (see 
Problem 1, p. 63). 


14. End Points Lying on Two Given Curves or Surfaces 


The first two chapters of this book have been devoted mainly to fixed 
end point problems, where the boundary conditions require that all admissible 
curves have two given end points. The only exception is the simple variable 
end point problem considered in Sec. 6, where the end points of the admissible 
curves are free to move along two fixed straight lines parallel to the y-axis. 
We now consider a more general variable end point problem. To keep 
matters simple, we start with the case where there is only one unknown 
function. Our problem can be stated as follows: Among all smooth curves 
whose end points Py and P, lie on two given curves y = 9(x) and y = (x), 
find the curve for which the functional 


JD = |" Fe, y, vax 


has an extremum. For example, the problem of finding the distance between 
two plane curves is of this type, with 


F(x, yy) = V1 + y?. 


As shown in the preceding section, the general variation of the functional 
J[y]is given by formula (5). IfJ/[y] has an extremum for the curve y = y(x), 
then, as noted at the end of Sec. 13, this curve must first of all be an 
extremal, i.e., a solution of Euler’s equation. Hence, the integral in (5) 
vanishes and we have 


SJ = Py lps; dy, + (F _ Fyy rer, 8x, 
= Fy pes S¥o = (F = Fy-y’)|2= 20 3X0, 


which must vanish if J[y] is to have an extremum for y = y(x). 
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Next, we observe that according to Figure 5, 


8¥o = [9'(x) + €o] 8x0, dy. = [b'(x1) + €:] 8x4, 
where £, > 0 as 5x, > 0, and «, > 0as 8x,—0. Thus, in the present case, 
the condition 8J = 0 becomes 
sJ = (Fy +F- Flees 8x1 _ (Fy + F- y'Fy)\e=20 8x = 0, 
(13) 
since 8J contains only terms of the first order in 5xy and 8x,. Since the 
increments 5x, and 8x, are independent, (13) implies the boundary conditions 
(Fy 9’ +F- y'Fy)|z=20 _ 0, 
(Fyv' + F-— VP yeas, 0, 


or 
[F + (9 — y')Fy]lz=2 = 9, 
[F + (y' = Y)Fy r=, = 0, 


called the transversality conditions. The curve y = y(x) satisfying these 
conditions is said to be a transversal of the curves y = 9(x) and y = V(x). 
Thus, to solve this kind of variable 
end point problem, we must first 
solve Euler’s equation 


fv =0, (14) 


and then use the transversality 
conditions to determine the values 
of the two arbitrary constants 
FIGURE 5 appearing in the general solution 
of (14). 
In solving variational problems, we often encounter functionals of the 
form 


x,+8x,  * 


[fe VT 4 y? ax. (15) 


For such functionals, the transversality conditions have a particularly simple 
appearance. In fact, in this case, 


y 
Fy = x; ll > 
y T(x, y) Vid ya 
so that the transversality conditions become 
hice Cag 2 ey oe 
F+ Y)Fy = T+ y2 = 0, 


_(+yVF_y 


Ft (y’ — y)Fy T+ y? 
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It follows that 


at the left-hand end point, while 
; 1 
ym-¥ 
at the right-hand end point, i.e., for functionals of the form (15), trans- 
versality reduces to orthogonality. 

The same kind of variable end point problem can be posed for functionals 
depending on several functions. For example, consider the following 
problem: Among all smooth curves whose end points lie on two given surfaces 
x = 9(y, z) and x = (y, z) find the curve for which the functional 


Jy, z] = J" Fs y zy, 2) dx 
Zo 


hasan extremum. Setting n = 2 in formula (7) of the preceding section, we 
obtain the general variation of the functional J[y, z]. By the same argument 
as in the case of one independent function, we fimd that the required curve 
y = (x), z = 2(x) must again be an extremal, i.e., satisfy the Euler equations 


F, — — F, =0, F,- fF =0. 


d. 
The boundary conditions are now 
0 , , 
Fy + EF — VFy — ZFe)lzaze = 0, 
a9 ’ , 
[FL + a, FF - VFy == ZF gee = 0, 
ad a i 
y + ey (F — y'Fy — z'Fy)\|:-2, = 9, 
oy UJ ' 
[F. + ae F yf — 2'Fz))\2=2 = 0, 


and are again called the transversality conditions. 


15. Broken Extremals. The Weierstrass-Erdmann Conditions 


So far, we have only considered functions defined for smooth curves, 
and hence we have only permitted smooth solutions of variational problems. 
However, it is easy to give examples of variational problems which have no 
solutions in the class of smooth curves, but which have solutions if we extend 
the class of admissible curves to include piecewise smooth curves. Thus, 
consider the functional 


Jbl =f) v0 -»P dx, WD = Wah 
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The greatest lower bound of the values of J[y] for smooth y = y(x) satisfying 
the boundary conditions is obviously zero, but it does not achieve this value 
for any smooth curve. In fact, the minimum is achieved for the curve 


0 for -l<x<0O, 


VIO x for O<x<l, 


which has a corner (i.e., a discontinuous first derivative) at the point x = 0. 
Such a piecewise smooth extremal with corners is called a broken extremal. 

Another problem involving broken extremals has already been encountered 
in Example 2, p. 20. There it is required to find the curve joining two points 
(Xo, Yo) and (x,, ¥;) which generates the surface of least area when rotated 
about the x-axis. As already noted, if y, and y, are sufficiently small 
compared to x, — Xp, the solution of the problem is given by the broken 
extremal 4x)x,B shown in Fig. 2(b), p. 21. This extremal consists of three 
line segments (two vertical and one horizontal) and can be included in the 
class of piecewise smooth curves if we set up the problem in parametric form. 

Guided by the above considerations, we enlarge the class of admissible 
functions, relaxing the requirement that they be smooth everywhere. Thus, 
we pose the following problem: Among all functions y(x) which are continuously 
differentiable for a < x < b except possibly at some point c (a <c < b), 
and which satisfy the boundary conditions 


Ya)= A, Wb) = B, (16) 


find the function for which the functional 


Jy] = f F(x, y, y') dx 


has a weak extremum. It is clear that on each of the intervals [a, c] and 
[c, 6] the function for which J[y] has an extremum must satisfy the Euler 
equation 
je ee (17) 
¥ dx” * : 


Writing J[y] as a sum of two functionals, i-e., 


J[y] = is F(x, y, y’) dx 
=| Foxy) dx + f ” F(x, yy") dx = AL] + Joly, 


we calculate the variations 8J, and 8J, of the two terms separately. The 
end points x = a, x = b are fixed, and we require that the two “pieces” 
of the function »(x) join continuously at x = c, but otherwise the point 
x = ccanmove freely. Using formula (5) to write 5J, and 8J2, and recalling 
that p(x) is an extremal, we find that 
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a, — Felksesa sy, a (F = yFy)|z=c-0 5x1, 
2 = — Fylesax0 sy, ai 6 aa yFy)|2=c+0 3x). 


(The condition that y(x) be continuous at x = c implies that 8J, and dJ, 
involve the same increments 8x, and Sy,.] At an extremum we must have 


bJ = 8J, + SJ, = 0, 
and hence 


(Files _ Fy|sceso) by, 
+ ((F _ yVFy)\z=c-0 -(F- yFy) 22040] dx, = 0. 


Since 5x, and Sy, are arbitrary, the conditions 


Fy|r=e-0 as Fylsne%os (18) 
(F = ¥ Ey ewe 0 > (F ae yYFy)\2=c+05 


called the Weierstrass-Erdmann (corner) conditions, hold at the point c 
where the extremal has a corner. 

In each of the intervals [a,c] and [c, 5], the extremal y = y(x) must 
satisfy Euler’s equation (17), i.e., a second-order differential equation. 
Solving these two equations, we obtain four arbitrary constants, which can 
then be found from the boundary conditions (16) and the Weierstrass- 
Erdmann conditions (18). 

The Weierstrass-Erdmann conditions take a particularly simple form if 
we use the canonical variables 


p=Fy, H=—-Fty'Fy 


introduced in Sec. 13. In fact, then the conditions (18) just mean that 
the canonical variables are continuous at a point where the extremal has a 
corner. 

The Weierstrass-Erdmann conditions have the following simple geometric 
interpretation: Let x and y take fixed values, plot the value of y’ along one 
coordinate axis, and plot the values of F(x, y, y’) along the other. The 
result is a curve, called the indicatrix, representing F(x, y, y’) as a function of 
y. Then the first of the conditions (18) means that the tangents to the 
indicatrix at the points y’(c — 0) and y‘(c + 0) are parallel, while the second 
condition, which can be written in the form 


Flr-c+o a Flrsc-o = Fyy'|r=c+0 = Fy-Y'|r=c-05 


means that the two tangents are not only parallel, but in fact coincide. 


PROBLEMS 


1. Justify the application of Theorem 2, p. 13 to the case of variable end point 
problems. 
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2. Derive the formula for the general variation of a functional of the form 
JO = [FG yy) de + Go Yo x Y- 
3. Derive the formula for the general variation of a functional of the form 
Jty] = is F(x, y, y’, y") dx. 
4. Find the curves for which the functional 


TS 
Jy) = [ (y? — y?) dx, 


can have extrema, given that y(0) = 0, while the right-hand end point can 
vary along the line x = 1/4. 


5. Find the curves for which the functional 


z 1 y2 
sy = [7 EE a, 


‘ , (0) = 0 


can have extrema if 
a) The point (xi, yi) can vary along the line y = x — 5; 
b) The point (x,, y1) can vary along the circle (x — 9)? + y? = 9. 
Ans. a)y= +Vi0x — x; b) y= +V8x — 


6. Find the curve connecting two given circles in the (vertical) plane along 
which a particle falls in the shortest time under the influence of gravity. 


7. Find the shortest distance between the surfaces z = 9(x, y)andz = $(x, y). 


8. Write the transversality conditions for the functional in Prob. 2 if the end 
points of the admissible curves y = y(x) lie on two given curves y = 9(x) 
and y = (x). 


9. Write the transversality conditions for a functional of the form 
Ilys 21 = [fen y, VT + 9? + 2? de 
Zo 


defined for curves whose end points lie on two given surfaces z = 9(x, y) 
and z = (x, y). Interpret the conditions geometrically. 


10. Find the curves for which the functional 
Jly, z) = a (y? + 2 + 2yz) dx 
0 


can have extrema, given that »(0) = z(0) = 0, while the point (x, yi, 21) 
can vary in the plane x = x,. 
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11. Show that for functionals of the form 
Zz 
Styl = [fee VT ot 8 de, 
Zo 


the transversality conditions reduce to the requirement that the curve y = y(x) 
intersect the curves y = 9(x) and y = (x) [along which its end points vary] 
at an angle of 45°. 


12. Find the curves for which the functional 
1 
J{y] = (1 + y"?) dx 
0 


can have extrema, given that »(0) = 0, y’(0) = 1, y(1) = 1, while y’(1) can 
vary arbitrarily. 


13. Minimize the functional 


1 
Ji= fo x y2dx, (== -1, X= 1 


Hint. Although the extremal y = x9 has no derivative at x = 0, it is 
easily verified by direct calculation that y = x!!? minimizes JLy]. 


14. Given an extremal y = y(x), possibly only piecewise smooth, of the 
functional 


J{y] = i , F(x, y, y’) dx, (Xo) = Yo, Yi) = X1, 
Zo 


suppose that 
Fyy[x, v(x). z] # 0 


for all finite z. Prove that y(x) is then actually smooth, with a smooth 
derivative, in [xo, x]. 


Hint. Use Theorem 3 of Sec. 4 and the geometric interpretation of the 
Weierstrass-Erdmann conditions given at the end of Sec. 15. 


15. Prove that the functional 


J{y] = I * (ay? + byy’ + cy?) dx, yY(%o) = Yo, Wx) = yi, 


70 


where a ¥ 0, can have no broken extremals. 


16. Does the functional 
71 
J1= |" y%dx, 0) = 0, ym) = 
o 


have broken extremals? 


17. Find the extremals of the functional 
4 
JTy] = i, (y’ — 1°’ + 1)? dx, y(0) = 0, y(4) = 2 


which have just one corner. 
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18. Find the curve for which the functional 
b 
Jbl=[ Fayydx  y@= 4, Wb) = 8 


has an extremum if the curve can arrive at (6, B) only after touching a given 
curve y = 9(x). 


19. Given a curve y = 9(x) and two points (a, A), (6, B) lying on opposite 
sides of the curve, consider the functional 


J{y] = [ F(x, y, y’) dx, y(a) = A, y(b) = B, 


where F(x, y, Y) = F(x, y, y’) on the side of the curve corresponding to 
(a, A), and F(x, y, »’) = F(x, y, y’) on the side of the curve corresponding to 
(6, B). Find the curve y = y(x) for which J[y] has an extremum. 


20. Using Fermat's principle (pp. 34, 36), specialize the results of Probs. 18 
and 19 to functionals of the form 


b 
[ fe, VT 49? ax, 
thereby deriving the familiar laws of reflection and refraction for light rays. 
21. Find the curves for which the functional 
10 
Jy= fy dx, 0) = 0 (10) = 0 


can have extrema, given that the admissible curves cannot penetrate the interior 
of the circle with equation 


(x — 5)? + y? =9, 


+3x for O<x< }f, 
Ans. y= +V9—(—5P for <x ¥, 
¥2(x — 10) for 34 < x < 10. 


4 


THE CANONICAL FORM 
OF THE EULER EQUATIONS 
AND RELATED TOPICS 


As already remarked in Sec. |, many physical laws can be expressed as 
variational principles, i.e., in terms of extremal properties of certain func- 
tionals. In this chapter, we shall illustrate this situation by using variational 
methods to study the classical mechanics of a system consisting of a finite 
number of particles. For example, we shall show how the trajectories in 
phase space of a mechanical system (which describe how the system evolves 
in time) can be found as the extremals of a certain functional. By using the 
calculus of variations, we can also find quantities connected with a given 
physical system which do not change as the system evolves in time. These 
and related ideas will be our chief concern here. First, we return to the 
subject of canonical variables (introduced in Sec. 13), and discuss the reduc- 
tion of the Euler equations to canonical form. Appendix | (p. 208) is closely 
related to the subject matter of this chapter, and contains another, independent 
derivation of the canonical equations and the Hamilton-Jacobi equation. 


16. The Canonical Form of the Euler Equations 


The Euler equations corresponding to the functional 


b 
Tiare Ial = [oF Yasar Yay Ia) a (1) 
67 
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(which depends on n functions) form a system of m second-order differential 
equations 


d ; 
F,, atu = 9 (@ = 1,..., 7). (2) 
This system can be reduced (in various ways) to a system of 2n first-order 
differential equations. For example, regarding yi,..., y, as 2 new functions, 
independent of y,,..., ¥pa, We Can write (2) in the form 
dy , d ' 
7 Fy — 7 Fu = 9 (Gi =1,...,”), (3) 


where yi, ---)¥ns ¥iy-- +> Yn are 2n unknown functions, and x is the independ- 
ent variable.1_ However, we obtain a much more convenient and symmetric 
form of the Euler equations if we replace x, ¥;,..., Yn: Yi, ---» Yn by another 
set of variables, i.e., the canonical variables introduced in the preceding 
chapter. The reader will recall that in Sec. 13, we used the equations 


yh = Fy G@ = 1,...,n) (4) 
to write yj,..., y, as functions of the variables? 


Xy V1y++ +9 Yn» Prs-+ +> Pa: (5) 


Then we expressed the function F(x, y1,...,¥n»Yis-++» Ye) appearing in 
(1) in terms of a new function H(x, y1,..-, Yn» Pis- ++» Pn) elated to F by 
the formula 


H=—-F+ > Vivo (6) 
ist 


where the y; are regarded as functions of the variables (5). The function 
H is called the Hamiltonian (corresponding to the functional J[y,,..., Yn])- 
Finally, we introduced the new variables 


X,Viy+++y Vas Pis-+ +> Dw A, (7) 


In other words, here (and elsewhere in this chapter), we regard the »; as new 
“‘variables.”. To avoid confusion, it would be preferable to write z, instead of yj, but 
we shall adhere to the commonly accepted notation. Thus, in cases where we are con- 
cerned with the derivative of a function »,, we shall emphasize this fact by writing 
dy,{ dx instead of y;. 


2 As already noted on p. 58, in making the transition from the variables x, yi,...; Yn» 
Vis «++, Yq to the variables x, y1,..-, Yn Pt» +++, Pn» WE Tequire that the Jacobian 
Apr, sie +» Pn) =, 
Cs det || Fyiv;. I 


be nonzero. We shall assume that this condition is satisfied. However, it should be 
kept in mind that this condition guarantees only the /ocal ‘solvability’ of the equations 
(4) with respect to yi,..., ys, but it does not guarantee the possibility of representing 
Yi,-++, Jn aS functions of x, ¥1,...,; ¥n,Pi,---;Pn Which are defined over the whole 
region under discussion. Thus, all our considerations have a /ocal character. 
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called the canonical variables (corresponding to the functional J[),,..., y,]), 
which were used on p. 58 to write a concise expression for the general variation 
of the functional J[y,,..., y,], and on p. 63 to give a simple interpretation 
of the Weierstrass-Erdmann conditions. 

We now show how the Euler equations (3) transform when we go over to 
canonical variables. In order to make this change in the Euler equations, 
we have to express the partial derivatives F,, (i.e., the partial derivatives of F 
with respect to y,, evaluated for constant x, y;,..., y,) in terms of the partial 
derivatives H,, (evaluated for constant x, pi,...,P,)-2 The direct evaluation 
of these derivatives would be rather formidable. Therefore, to avoid lengthy 
calculations, we write the expression for the differential of the function H. 
Then, using the fact that the first differential of a function does not depend 
on the choice of independent variables (i.e., is invariant under changes 
of the independent variables), we shall obtain the required formulas quite 
easily. 

By the definition of H, we have 


dH =—dF + > padyt > yidp, 
i=1 izi 


so that 
OF *. OF OF ,, 
dH = — = dx— 2, ay i— 2 aya 


; f (8) 
+> pdt > nap. 
t=] {=1 


Ordinarily, before using (8) to obtain expressions for the partial derivatives 
of H, we would have to express the dy; in terms of x, y,, and p, However 
(and this is the important feature of the canonical variables), because of the 
relations 


OF 
a= i= ],...,n), 
oy; Pi ( ) 


the terms containing dy, in (8) cancel each other out, and we obtain 


dy + > vi dp. 9 
ay, Dit 2, via (9) 


Thus, to obtain the partial derivatives of H, we need only write down the 
appropriate coefficients of the differentials in the right-hand side of (9), i.e., 


oH i OF oH = OF oH 

ox ox oy, oy, ap, 

3 The notation ordinarily used in analysis to denote partial derivatives suffers from 
the familiar defect of not specifying just which variables are held fixed. 
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In other words, the quantities 0F/d0y; and y; are connected with the partial 
derivatives of the function H by the formulas 


|. 0H  0F __ oH 


i= aoe 10 
Y= Gp, oy; oy, 1) 
Finally, using (10), we can write the Euler equations (3) in the form 

dy, 0H dp, _—s OH ss 

qe op de ay, (Gi = 1,..., 7). (11) 


These 2n first-order differential equations form a system which is equivalent 
to the system (3) and is called the canonical system of Euler equations (or 
simply the canonical Euler equations) for the functional (1). 


17. First Integrals of the Euler Equations 


It will be recalled that a first integral of a system of differential equations is 
a function which has a constant value along each integral curve of the system. 
We now look for first integrals of the canonical system (11), and hence of the 
original system (3) which is equivalent to (11). First, we consider the case 
where the function F defining the functional (1) does not depend on x 
explicitly, ie., is of the form F()1,...,¥n, Vi,---> Yn)» Then the function 


H=-F+ > vip 
t=1 


also does not depend on x explicitly, and hence 


nS (2d 2H dy 


Wk ae on ae (12) 


i=l 
Using the Euler equations in the canonical form (11), we find that (12) 
becomes 


> 


ae Dee a a 


dH — = (= OH oH =) 
NO Op; Op, Oy; 


along each extremal.* Thus, if F does not depend on x explicitly, the function 
H(yy,. «+> Yn» Pty ++ +s Pra) iS @ first integral of the Euler equations.® 


4 If H depends on x explicitly, the formula 


can be derived by the same argument. 
5 Cf. the discussion in Case 2, p. 18 of the integration of Euler’s equation for 
functionals which are independent of x. 
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Next, we consider an arbitrary function of the form 


® — D(y3,-- +5 Yas Pis+ + +s Pn) 


and we examine the conditions under which ® will be a first integral of the 
system (11). Wedrop the assumption that F does not depend on x explicitly, 
and instead we consider the general case. Along each integral curve of the 
system (11), we have 


- 2a, ao® ay om dp; 
‘Oy, dx + bp, dx 

eo GH §=8@ OH 
m Oy; Op; op, Oy, 


where the expression 


is called the Poisson bracket of the functions ® and H. Thus, we have 
proved the formula 


d® 


< = (#1 (13) 


It follows from (13) that a necessary and sufficient condition for a function 
@ = O(),,..., Yas Pis- ++» Pn) to be a first integral of the system of Euler 
equations (11) is that the Poisson bracket [®, H] vanish identically.§ 


18. The Legendre Transformation 


We now consider another method of reducing the Euler equations to 
canonical form, a method which differs from that presented in Sec. 16. 
The idea of this new method is to replace the variational problem under 
consideration by another, equivalent problem, such that the Euler equations 
for the new problem are the same as the canonical Euler equations for the 
original problem. 


18.1. We begin by discussing some related topics from the theory of 
extrema of functions of n variables. First, we consider the case n = 1. 


® According to the existence theorem for the system (11), there is an integral curve of 
the system passing through any given point (x,¥1,...,¥n»Pi;---)Pn)- Hence, if 
[®, H] = O along every integral curve, it follows that [®, H]= 0. If ® (as well as H) 
depends on x explicitly, it is easily verified that (13) is replaced by 


dd 6® 
‘dx 8x Ed 
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Suppose we are looking for an extremum, say a minimum, of the function 
S(E), and suppose f(&) is (strictly) convex, which means that 


f'(E) > 0 


wherever /(&) is defined. We introduce a new independent variable 
p=f'), (14) 


called the tangential coordinate, which is just the slope of the tangent passing 
through a given point of the curve y = f(&). Since by hypothesis 


B= s'® > 0, 

we can use (14) to express & in terms of p. In fact, since the function f() 

is convex, any point of the curve y = /(&) is uniquely determined by the slope 

of its tangent (see Figure 6). Of course, the 

same is true for a (strictly) concave function, 

i.e., a function such that f”(E) < 0 everywhere. 
We now introduce the new function 


H(p) = — f(E) + pé, (15) 
where & is regarded as the function of p obtained 
by solving (14). The transformation from the 
variable and function pair &, f(&) to the variable 

FiGuRE 6 and function pair p, H(p), defined by formulas 

(14) and (15), is called the Legendre transform- 

ation. It is easy to see that since f(E) is convex, so is H(p). [The convex 
functions H(p) and /(&) are sometimes said to be conjugate.] In fact, 


dH = —f'(E)d& + pd& + Edp 


implies that 


dH 
and hence 
a@*H~ 1 | 
se EET EREwm> dy, 17 
dp® ~ dp dp F*@) > Sa 
dé 


since f"(E) > 0. Moreover, if the Legendre transformation is applied to 


the pair p, H(p), we get back the pair &, f(&). This follows from (16) and 
the relation 


—H(p) + pH"(p) = f(®) — pH"(p) + pH'"(p) = f(&). (18) 


Thus, the Legendre transformation is an involution, i.e., a transformation 
which is its own inverse. 
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Example. If 


f®== @>d, 
then 
S®@=p=8-3, 
i.e., 
E = pille-v, 
It follows that 
H 2 peee™ aa-1) — paa-1) I 
amine «as + pp = p (-5+1) 
and therefore 
p’ 
H(p) = BS 
where 6 is related to a by the formula 
1 1 
a + Be 1. 
Next, we show that if 
—H(p) + Ep (19) 
is regarded as a function of two variables, then 
S@) = mBx [—H(p) + Ep]. (20) 


[In fact, we can use (20) instead of (15) to define the function H(p).] To 
prove this result, we note that according to (18), the function (19) reduces 
to f(&) when the condition 


5 l-HO) + Ep] = Hp) + £ = 0, 
or 
E = H'(p), 


is satisfied. Thus, f(&) is an extremum of the function —H(p) + &p, 
regarded as a function of p. Moreover, the extremum is a maximum since 


ms [—H(p) + Ep] = —H"(p) < 0 
[cf. (17)]. It follows that 
min SE) = min max [—H(p) + Ep], 


ie., the extremum of {() is also an extremum of (19), regarded as a function 
of two variables. 
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Similar considerations apply to functions of several independent variables. 
Let 


S(Ea +++ En) 
be a function of n variables such that 
det Il Fevee # 0, (21) 
and let 
PB=Sy (i = 1,...,n). (22) 


Then, using (22) to write &,...,&, in terms of p,,...,p,, we form the 
function 


Hin hp Saf + > ED. 


As in the case of one variable, it can be shown that 


PEE) = ext [HPs + > pd 


and 


exe Sa, Sesig En) Se ext os [-40. “- +» Pn) + > Ap 


Slsceose Ra eae EneP1s.- 


where ext denotes the operation of taking an extremum with respect to the 
indicated variables. In other words, the extremum of f(&,..., &,) is also 
an extremum of 


—H(py,..-sPx) + D>, Po 
{=1 


regarded as a function of 2 variables. 


Remark. If instead of (21), we impose the stronger condition that the 
matrix 


Fete I 


be positive definite, i.e., that the quadratic form 


n 
> S eet Me 
i,k=1 


be positive for arbitrary real numbers a,,...,,,”7 then 


F(Ex, saey E.) = Pes [-H0., o «> Pn) Be > él: (23) 


7 This is the condition for the function /(E,,..., &,) to be (strictly) convex. 
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It follows from (23) that 
—H(pr,-+-sPn) + >, Pb < SExy.» -s En) 
i=l 


for arbitrary pj,..., Da, ie. 


> Pres < H(py, .--5 Pn) ts Heise? “) Ens 


isl 
a result known as Young’s inequality. 


18.2. We now apply the considerations of Sec. 18.1 to functionals. Given 
a functional 


b 
Jb = fF,» y) dx, (24) 
we set 
p= F,(x,y, y') (25) 
and 
H(x, y, p) = — F + py’. (26) 


Here we assume that F,.,, # 0, so that (25) defines y’ as a function of x, 
y and p. Then we introduce the new functional 


Jt, pl = [° [-H0sy, p) + pyle 27) 


where y and p are regarded as two independent functions, and y’ is the deriva- 
tive of y. This functional is obviously the same as the original functional 
(24), if we choose p to be given by the expression (25). The Euler equations 
for the functional (27) are 


OH dp OH , dy _ 
oy ax” ~~ @p + oe = % (28) 
i.e., just the canonical equations for the functional (24). If we can show 
that the functionals (24) and (27) have their extrema for the same curves, 
this will prove that the equation 


755 =0 (29) 


and the equations (28) are equivalent, thereby providing a new derivation of 

the canonical equations, independent of the derivation given in Sec. 16. 
First, we observe that the transformation from the variables x, y, y and 

the function F to the variables x, y, p and the function H, defined by formulas 
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(25) and (26), is an involution, i.e., if we subject H(x, y, p) to a Legendre 
transformation, we get back the function F(x, y, y’). In fact, since 


OF OF ; 
it follows that 


and hence 
OH 
OEE Bg ok = BY EY eis (30) 


[Cf. formula (9) of Sec. 16.] 

Next, we note that to prove the equivalence of the variational problems (24) 
and (27), it is sufficient to show that J[y] is an extremum of J[y, p] when p 
is varied and y is held fixed, symbolically 


J{y] = ext J[y, Pl, (31) 


since then an extremum of J[y, p] when both p and y are varied will be an 
extremum of J[y]. Since Jty, P| does not contain p’, to find an extremum 
of J[y, p] it is sufficient to find an extremum of the integrand in (27) at every 
point (cf. Case 3, p. 19). Thus we have 


a he: 
Pri aa 2 ae 


from which it follows that 


» _ OW 
But this implies (31), since 
oH 


according to (30). Thus, we have proved the equivalence of the variational 
problems (24) and (27), and of the corresponding Euler equations (28) and 
(29). Although we have only considered functionals depending on a single 
function, completely analogous considerations apply to the case of functionals 
depending on several functions. 


Example. Consider the functional 
b 
[ (Py? + Oy dx, (32) 


where P and Q are functions of x. In this case, 


p=2Py', H= Py? — Qy’. 
and hence 


2 
H = 5 - Oy" 
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The corresponding canonical equations are 


dp _ dy p 
de 72) ae = DP’ 


while the usual form of the Euler equation for the functional (32) is 


d j 
2yQ0 - ax APY’) = 0. 


19. Canonical Transformations 


Next, we look for transformations under which the canonical Euler 
equations preserve their canonical form. The reader will recall that in Sec. 8 
we proved the invariance of the Euler equation 


Fy dx Fy = 0 

under coordinate transformations of the form 
u = u(x, y), ur, Uy 40. 

= v(x, y), Dz Dy 


(Such transformations change y’ to dv/du in the original functional.) The 
canonical Euler equations also have this invariance property. Furthermore, 
because of the symmetry between the variables y; and p, in the canonical 
equations, they permit even more general changes of variables, i.e., we can 
transform the variables x, y,, p; into new variables x, 


Y, = Yi(x, Yay -- ++ Yas Pis +++» Pn)s (33) 
P, = P(x, Yas - - +9 Yn» Pi - - «Pads 
In other words, we can think of letting the p, transform according to their 
own formulas, independently of how the variables y, transform. However, 
the canonical equations do not preserve their form under all transformations 
(33). We now study the conditions which have to be imposed on the 
transformations (33) if the Euler equations are to continue to be in canonical 
form when written in the new variables, i.e., if the canonical equations are to 
transform into new equations 
dY,;  0H* dP, _ —-aH* 
dx OP, dx OY,’ 
where H* = H*(x, Y,,..., Yn, Pi,..., Pa) is some new function. Trans- 
formations of the form (33) which preserve the canonical form of the Euler 
equations are called canonical transformations. 
To find such canonical transformations, we use the fact that the canonical 
equations 


(34) 


dy, OH dp, __—*oH 
dx ap, dx oy, (35) 
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are the Euler equations of the functional 


b n 
Tyas ++ +s Pus Pasos Pal = f° (> pot — H) de (36) 
@ N\isl 


in which the y, and p, are regarded as 2” independent functions. We want 
the new variables Y;, and P, to satisfy the equations (34) for some function 
H*, This suggests that we write the functional which has (34) as its Euler 
equations. This functional is 


PY. ci VP PVs Lia PY, — H*) ae (37) 
a Niel 


where Y, and P, are the functions of x, y,; and p, defined by (33), and Y; 
is the derivative of Y,. Thus, the functionals (36) and (37) represent two 
different variational problems involving the same variables y, and p,, and 
the requirement that the new system of canonical equations (34) be equivalent 
to the old system (35), i.e., that it be possible to obtain (34) from (35) by 
making a change of variables (33), is the same as the requirement that the 
variational problems corresponding to the functionals (36) and (37) be 
equivalent. 

In the remarks made on p. 36, it was shown that two variational problems 
are equivalent (i.e., have the same extremals) if the integrands of the corre- 
sponding functionals differ from each other by a total differential, which in 
this case means that 


> pdy, — Hdx = > P,dY, — H* dx + d(x, V1, .. 5 Yay Pay +++» Pn) 
i=1 i=l 
(38) 


forsome function ®. Thus, if a given transformation (33) from the variables 
X, Yi, P, to the variables x, Y;, P,; is such that there exists a function ® satis- 
fying the condition (38), then the transformation (33) is canonical. In this 
case, the function ® defined by (38) is called the generating function of the 
canonical transformation. The function ® is only specified to within an 
additive constant, since, as is well known, a function is only specified by its 
total differential to within an additive constant. 

To justify the term “‘ generating function,’’ we must show how to actually 
find the canonical transformation corresponding to a given generating 
function ®. This is easily done. Writing (38) in the form 


d® = > p,dy,— >, Pd, + (H* — H) dx, 
i=1 i=1 
we find that® 
ad am am 
=> = > * i a 
aaa P= ay HY =H +> (39) 


5 @® is originally a function of x, y, and p,. However, by using (33), we can write ® 
as a function of the variables x, », and Y%. 
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Then (39) is precisely the desired canonical transformation. In fact, the 
2n + 1 equations (39) establish the connection between the old variables 
yi, Py and the new variables Y;, P;, and they also give an expression for the 
new Hamiltonian H*. Moreover, it is obvious that (39) satisfies the condition 
(38), so that the transformation (38) is indeed canonical. If the generating 
function ® does not depend on x explicitly, then H* = H. In this case, 
to obtain the new Hamiltonian H*, we need only replace y, and p,; in H by 
their expressions in terms of Y, and P;.° 

In writing (39), we assumed that the generating function is specified as a 
function of x, the old variables y, and the new variables Y;: 


® = O(x, 1, ---5 nv ) eer Y,). 


It may be more convenient to express the generating function in terms of 
y;, and P, instead of y, and Y;. To this end, we rewrite (38) in the form 


a(o +> PY.) = > pdn+ > YidP, + (H* — H) dk, 
i=1 {=1 i=1 
thereby obtaining a new generating function 
0+ > AY, (40) 
t=1 


which is to be regarded as a function of the variables x, y, and P;. Denoting 
(40) by V(x, y1,---, Yn» Pi,-++> Pa), We Can write the corresponding canon- 
ical transformation in the form 


_ oF _ oF re. 4 
di = ay, Y a OP, H = H + ae (41) 
20. Noether’s Theorem 


In Sec. 17 we proved that the system of Euler equations corresponding 
to the functional 


b 
| FOy onde Keswas, (42) 
where F does not depend on x explicitly, has the first integral 
H=-F+ > yiFy. 
t=1 
It is clear that the statement “‘ F does not depend on x explicitly” is equivalent 


to the statement “F, and hence the integral (42), remains the same if we 
replace x by the new variable 


x*=x+e, (43) 


° A similar remark holds for the function ¥ in (41). 
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where ¢« is an arbitrary constant.’ It follows that A is a first integral of 
the system of Euler equations corresponding to the functional (42) if and 
only if (42) is invariant under the transformation (43).!° 

We now show that even in the general case, there is a connection between 
the existence of certain first integrals of a system of Euler equations and the 
invariance of the corresponding functional under certain transformations 
of the variables x, y,,...,¥,- We begin by defining more precisely what 
is meant by the invariance of a functional under some set of transformations. 
Suppose we are given a functional?! 


Z 
JD. +s Ya] = i} Cte rs oe Ee ee 
Zo 
which we write in the concise form 
JUN = [" Fey y) dx, (44) 


where now y indicates the n-dimensional vector (),,...,y,) and y’ the 
n-dimensional vector (yi,..., y,). Consider the transformation 


x* = OX, Vise 005 Vas Vase 09 Va) = D(x, y, y’), (45) 
ye = F(x, Yip. + +s Var Yr-- oda) = Vy, Ys 


where i = l,...,”. The transformation (45) carries the curve y, with the 
vector equation 


y=VX) <x < x), 


into another curve y*. In fact, replacing y, y’ in (45) by y(x), y’(x), and 
eliminating x from the resulting » + 1 equations, we obtain the vector 
equation 

ye = y*x*) (Xd < x* < xf) 
for y*, where y* = (y¥,..., y*). 


DEFINITION. The functional (44) is said to be invariant under the 
transformation (45) if J[y*] = Jly], ie., if 


tf gig dy* * se dy 
in F(x »y ea) dx* = (e F (x,y, 32) dx. 


1° The fact that A’ is a first integral only if (42) is invariant under the transformation 
(43) follows from the formula 
dH _ 20H 
‘dx ox 
(see footnote 4, p. 70), since 0H/ax = 0 only if aF/ax = 0. 

11 To avoid confusion in what follows, the reader should note that the subscripts can 
play two different roles; when indexing x, they refer to different values, while when indexing 
y, they refer to different functions. For example, the y* are new functions, while x¥ and 
x¥ are the new positions of the end points of the interval [xo, x:]. 
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Example 1. The functional 
rE 
J[y] = i} * y? dx 

Zo 

is invariant under the transformation 
x*=x+e, yr=y, (46) 
where « is an arbitrary constant. In fact, given a curve y with equation 
Y=YWX) (OLX %H), 


the ‘“‘transformed” curve y*, i.e., the curve obtained from y by shifting it a 
distance e along the x-axis, has the equation 


yt = yx*® —e) = y*X*) (HD +E < X* <x + 8), 
and then 


* * * 2 
wis tt dy*(x yy" _ opmte [2 — 2] ‘ 
Jfy"] Is | ah | oe = 7 gee as 


_ 7 eal _ 
7 ie dx ae aint 
Example 2. The integral 
J{y] = me xy"? dx 
Io 


is an example of a functional which is not invariant under the transformation 
(46). In fact, carrying out the same calculations as in Example I, we obtain 


* 1 5] 2 zy+e * 2 
Jty*] = [x [4S ] dxt = [* x* [a =] dx* 
dt) 


Ig te x* 


= le (x + €) [a2] dx = Jly] +e [. eal dx # Jy]. 


Suppose now that we have a family of transformations 
x* = O(x, y, y’; ©), 
yi = Vian yy’ E), 

depending on a parameter ec, where the functions ® and Y; (/ = 1,..., 7) 


are differentiable with respect to e, and the value e = 0 corresponds to the 
identity transformation: 


(47) 


D(x, y, y'; 0) = x, 
Vix, y, 3 0) = yi. 


Then we have the following result: 


(48) 


THEOREM (Noether). If the functional 


Jol = [Fey 9) dx (49) 
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is invariant under the family of transformations (47) for arbitrary xX. and 
x1, then 


> Fy, + (F - s vy) @ = const (50) 
f=1 t=1 


along each extremal of J[y], where 


n _ OO(x, y, Y'5€ 
ox, y, y') = PSR 9 


OY (x, y, y'5 ©) 
Oe 


ee (51) 
vil y ¥’) = 


e=0 


In other words, every one-parameter family of transformations leaving 
J[y] invariant leads to a first integral of its system of Euler equations. 


Proof. Suppose « is a small quantity. Then, by Taylor’s theorem, 
we have}? 


x* = O(x,y, y'5€) = O(x,y, 3; 0 +e Cs WI" 8) ss + o(€), 
, ’ oY; Js aS 
yE = Fix, y, v3 ©) = Fix, y, 5 0) + eG wie) be o(e), 
or using (48) and (51), 


YE = yw + eh(x, y, y') + of). 
Assuming that the curve 


N=VX) (l<i<n) 


is an extremal of J[y], we can use formula (11) of Sec. 13 to write an 
expression for the variation of J[y] corresponding to the transformation 
(52). Since in the present case?* 


dx = eg, dy, = eb, 
the result is 


=z, 


iJ =e [> Fh + (F- >, Fr) @| 


=I9 


12 As usual, 7 = o(e) means that m/e — 0 as e — 0. 

13 Here 5x, 5y, mean the principal linear parts (relative to «) of the increments Ax, Ay, 
of x, y, and not simply Ax, Ay, as in Sec. 13. It is easy to see that this change in inter- 
pretation has no effect on the final result, and has the advantage of making it unnecessary 
to bother with infinitesimals of higher order. 
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Since by hypothesis, J[y] is invariant under (52), 5J vanishes, i-e., 
[5 ea + (PF $ iru) o] 
i=1 i=1 I=I09 
= & Fy} + (F -~> vi) e| 
{=1 t=1 r=2) 


The fact that (50) holds along each extremal now follows from the 
arbitrariness of x, and x. 


Remark. In terms of the canonical variables p; and H, equation (50) 
becomes simply 


n 


> pi — He = const. (53) 


i=1 


Example 3. Consider the functional 
Jtyl = [" FO, y) dx, (54) 
Zo 


whose integrand does not depend on x explicitly. Then, by exactly the 
same argument as given in Example 1, J[y] is invariant under the one- 
parameter family of transformations 

x*=x+te, ye =). (55) 
In this case, 


and (53) reduces to just 
H = const, 


i.e., the Hamiltonian H is constant along each extremal of J[y]. Thus, we 
again obtain a result already proved in Sec. 17: For a functional of the 
form (54), which does not dépend on x explicitly, the Hamiltonian is a first 
integral of the system of Euler equations. 


21. The Principle of Least Action 


We now apply the general results obtained in the preceding sections to 
some mechanical problems. Suppose we are given a system of n particles 
(mass points), where no constraints whatsoever are imposed on the system. 
Let the ith particle have mass m,; and coordinates x,, y,,z, (i= 1,..., n). 
Then the kinetic energy of the system is !* 


T= 5 > m2 + + 2). (56) 
i=1 


14 Here ¢ denotes the time, and the overdot denotes differentiation with respect to f. 


84 CANONICAL FORM OF THE EULER EQUATIONS CHAP. 4 


We assume that the system has potential energy U, i.e., that there exists a 
function 


U= U(t, X15 Viy Z19- + +5 Xny Voy Zn) (57) 
such that the force acting on the ith particle has components 


ou ou oU 
BS gen apg IO Toe 


Next, we introduce the expression 
L=T-U, (58) 


called the Lagrangian (function) of the system of particles. Obviously, L is 
a function of the time ¢ and of the positions (x;, y,, z,) and velocities (%,, };, Z;) 
of the » particles in the system. 

Suppose that at time f) the system is in some fixed position. Then the 
subsequent evolution of the system in time is described by a curve 


x= x(t), w= yt), 4 = z(t) (G=1,...,”) 


in a space of 3n dimensions. It can be shown that among all curves passing 
through the point corresponding to the initial position of the system, the 
curve which actually describes the motion of the given system, under the 
influence of the forces acting upon it, satisfies the following condition, 
known as the principle of least action: 


THEOREM. The motion of a system of n particles during the time 
interval [to, t,] is described by those functions x,(t), y,(t), 2(t), 1 < i <n, 
for which the integral 


ty 
L dt, (59) 
to 
called the action, is a minimum. 


Proof. We show that the principle of least action implies the usual 
equations of motion for a system of 7 particles. If the functional (59) 
has a minimum, then the Euler equations 


6L dd OL _ 


ax, dias, © 

OL d OL 

eer aa (60) 
a _da@_y 

Oz; dt Oz; 7 


must be satisfied for i = 1,...,n. Bearing in mind that the potential 
energy U depends only on ?, x,, y;, z;, and not on X;, );, z,, while T is a 
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sum of squares of the velocity components %X,, );, Z, (with coefficients 
4m,), we can write the equations (60) in the form 


_oU da x =0 

Ox, Gera 

ou dad. 

oy, ht my, = 0, (61) 
a 

~ Oz, Me 


Finally, since the derivatives 
aU, au, au 
Ox, oy; 02; 
are the components of the force acting on the ith particle, the system 
(61) reduces to 


mx; = X, 
my = Yi, 
mz, = Z,, 


which are just Newton’s equations of motion for a system of n particles, 
subject to no constraints. 


Remark 1. The principle of least action remains valid in the case where the 
system of particles is subject to constraints, except that then the admissible 
curves, for which the functional (59) is considered, have to satisfy the con- 
straints. In other words, in this case, application of the principle of least 
action leads to a variational problem with subsidiary conditions. 


Remark 2, Actually, as we shall see later (Sec. 36.2), the principle of 
least action only holds for sufficiently small time intervals [to, ¢,], and has 
to be modified for continuous mechanical systems. 


22. Conservation Laws 


We have just seen that the equations of motion of a mechanical system 
consisting of n particles, with kinetic energy (56), potential energy (57) and 
Lagrangian (58), can be obtained from the principle of least action, i.e., by 
minimizing the integral 


t t 
[ica = [a uae (62) 
to to 
The canonical variables corresponding to the functional (62) turn out to be 


i 2S me 
Piz = 33, ~ ‘Sad t) 


OL . 
Py > ay; = my, 

OL 2 
Piz = 2, = M2, 
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which are just the components of the momentum of the ith particle.® In 
terms of p;;, pi, and p,;,, we have 


H= > (Xie + WD + 4Piz) — L = 2T —(T- U)=T+ U, 
i=1 


so that H is the total energy of the system. 

Using the form of the integrand in (62), wecan find various functions which 
maintain constant values along each trajectory of the system, thereby 
obtaining so-called conservation laws. 


1. Conservation of energy. Suppose the given system is conservative, 
which means that the Lagrangian LZ (or more precisely, the potential 
energy U) does not depend on time explicitly. Then, as shown in 
Sec. 17 (see also Sec. 20, Example 3), H = const along each extremal, 
i.e., the total energy of a conservative system does not change during 
the motion of the system. 


2. Conservation of momentum. First, we recall that according to Noether’s 
theorem (Sec. 20), invariance of the functional (49) under the family of 
transformations 

x* = Ox, y, y'; ©) = x, 
yE = Vix, y's 2) 
implies that the corresponding system of Euler equations has the first 


integral 
n 
>. Fy, = const, 
{=1 
where 
, oY; x, > - E 
vi(x, y, y’) = TD 38) 
e=0 
since in this case, 
ny _ D(x, y, V5 € 
ox, yy’) = POY D a BY 9 
e=0 


Therefore, the invariance of the functional (62) under the transformation 
xP=xt+e, WR=y, w= y 


implies that 


s we = const, 
i=l 


1.€., 


n 
> Piz = const. 
t=1 


© By analogy with mechanical problems, the variables p, = F,; are often called the 
momenta, regardless of the interpretation of the integrand F appearing in the functional (1). 
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Similarly, it follows from the invariance of (62) under displacements 
along the y-axis that 


n 
> Piy = const, 
f=1 


and from the invariance of (62) under displacements along the z-axis that 


n 
> Pyz = const. 
i=1 


The vector P with components 


P.= > Dw P= > Dus R= Spe 
{=1 {=1 {=1 


is called the total momentum of the system. Thus, we have just proved 
that the total momentum is conserved during the motion of the system 
if the integral (62) is invariant under parallel displacements. _ [It is clear 
from these considerations that the invariance of (62) under displace- 
ments along any coordinate axis, e.g., along the x-axis, implies that 
the corresponding component of the total momentum is conserved.] 


. Conservation of angular momentum. Suppose the integral (62) is 


invariant under rotations about the z-axis, i.e., under coordinate 
transformations of the form 


x* = x,cose + y; sine, 


yi = —x, sine + y, cose, 
zy = %. 
In this case, 
1 ox* = 
iz Oe Cea ts 
td 
dy = oye =-xX, 
iy’ Oe ‘xo 1» 
oz 
=— = 0. 
Viz Oe oar > 


and hence Noether’s theorem implies that 
> (» = 5%) = const 
a \ex,7' ay, , 
i.e., 


>, (Padi — PX) = const. (63) 
i=1 


Each term in this sum represents the z-component of the vector product 
Pp; x ™, wherer, = (x, ¥;, Z;,) is the position vector and py = (Piz, Piy> Piz) 
the momentum of the ith particle. The vector p, x r is called the 
angular momentum of the ith particle, about the origin of coordinates, 
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and (63) means that the sum of the z-components of the angular 
momenta of the separate particles, i.e., the z-component of the total 
angular momentum (of the whole system) is a constant. Similar asser- 
tions hold for the x and y-components of the total angular momentum, 
provided that the integral (62) is invariant under rotations about the x 
and y-axes. Thus, we have proved that the total angular momentum 
does not change during the motion of the system if (62) is invariant 
under all rotations. 


Example 1. Consider the motion of a particle which is attracted to a 
fixed point, according to some law. In this case, energy is conserved, since 
Lis time-invariant, and angular momentum is also conserved, since L is 
invariant under rotations. However, momentum is not conserved during 
the motion of the particle. 


Example 2. A particle is attracted to a homogeneous linear mass distri- 
bution lying along the z-axis. In this case, the following quantities are 
conserved: 


1. The energy (since L is independent of time); 
2. The z-component of the momentum; 
3. The z-component of the angular momentum. 


23. The Hamilton-Jacobi Equation. Jacobi’s Theorem’® 


Consider the functional 


TU = f° FO Yass Dna Yio on Yad a (64) 


defined on the curves lying in some region R, and suppose that one and only 
one extremal of (64) goes through two arbitrary points A and B. The 
integral 


S= [FOI - Ie Iisa) a (65) 
Zo 
evaluated along the extremal joining the points 


A= (Xo, yi, - . «snd» B 77, (1, Vis. +s Yad (66) 


is called the geodetic distance between Aand B. The quantity S is obviously 
a single-valued function of the coordinates of the points A and B. 


18 Tn this section, we drop the vector notation introduced in Sec. 20, and revert to the 
more explicit notation used earlier. The vector notation will be used again later 
(e.g., in Sec. 29). 
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Example 1. If the functional J is arc length, S is the distance (in the usual 
sense) between the points A and B. 

Example 2. Consider the propagation of light in an inhomogeneous and 
anisotropic medium, where it is assumed that the velocity of light at any 
point depends both on the coordinates of the point and on the direction of 
propagation, i.e., 

v = u(x, y, Z, X, J, 2). 
The time it takes light to go from one point to another along some curve 
x=x(t) y=yt), z= 2(t) 
is given by the integral 
vet P+ 2 
to v 


T= dt. (67) 
According to Fermat’s principle, light propagates in any medium along the 
curve for which the transit time 7 is smallest, i.e., along the extremal of the 
functional (67). Thus, for the functional (67), S is the time it takes light 
to go from the point A to the point B. 


Example 3. Consider a mechanical system with Lagrangian L. According 
to Sec. 21, the integral 


ty 
i L(t, X14, Vis Z1y +++) Xnv Yay Zn) at 
0 


evaluated along the extremal passing through two given points, i.e., two 
configurations of the system, is the “least action” corresponding to the 
motion of the system from the first configuration to the second. 

Iftheinitial point A is regarded as fixed andthe final point B = (x, 1, ..-,¥n) 
is regarded as variable,!” then in the region R, 


S = S(x, Vis ++ +s Yn) (68) 


is a Single-valued function of the coordinates of the point B. We now 
derive a differential equation satisfied by the function (68). We first 
calculate the partial derivatives 


oS oS F 
on ay, G =1,...,n), 


by writing down the total differential of the function S, i.e., the principal 
linear part of the increment 


AS = S(x + dx, Vi = dy, axe so Dan + dy) _ S(x, Vis ie! > Da) 
Since, by definition, AS is the difference 


J{y*] — Jfyl, 


17 Since 8 is now variable, we drop the superscript in the second of the formulas (66). 
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where y is the extremal going from 4 to the point (x, yi1,..., y,) and y* is 
the extremal going from A to the point (x + dx, y, + dy,,...,¥, + dyn), 
we have 

dS = 8, 


where the “unvaried”’ curve is the extremal y and the initial point A is held 
fixed. (The fact that the “varied” curve y* is also an extremal is not 
important here.) 

Thus, using formula (12) of Sec. 13 for the general variation of a functional, 
we obtain 


dS(x, yi, oe «> Yn) = bd = > PA dy, _ H dx, (69) 
1=1 


where (69) is evaluated at the point B. It follows that 


os os 
ax =—-H, ay, = Pi, (70) 
where 78 


Pu = PAX Vise +s Yn) = FyiDes Vase +5 Yay Vi), --- VAI (71) 
and 
H = H[x, yi, soe «> Yn» Prl®, Vas ss. +> Ynds si +> Pa(Xs Vis a 7) 


are functions of x, )1,...,¥,. Then from (70) we find that S, as a function 
of the coordinates of the point B, satisfies the equation 


0 
tH (xan. 


asa 
= ) =; (72) 


19 Ym By? yy 


The partial differential equation (72), which is in general nonlinear, is called 
the Hamilton-Jacobi equation. There is an intimate connection between the 
Hamilton-Jacobi equation and the canonical Euler equations. In fact, the 
canonical equations represent the so-called characteristic system associated 
with equation (72).19 We shall approach this matter from a somewhat 
different point of view, by establishing a connection between solutions of the 
Hamilton-Jacobi equation and first integrals of the system of Euler equations: 


THEOREM 1. Let 


S= SCX, Vay +s Vas Lay + + + Xm) (73) 


1®In (71), yx) denotes the derivative dy,/dx calculated at the point B for the 
extremal y going from A to B. 

18 See e.g., R. Courant and D. Hilbert, Methods of Mathematical Physics, Vol. H, 
Interscience, Inc., New York (1962), Chap. 2, Sec. 8. 
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be a solution, depending on m (<n) parameters a,...,%m Of the 
Hamilton-Jacobi equation (72). Then each derivative 

os ; 
2a, (i _ 1, PR) m) 
4 


dy, _ 0H = dp, __ 0H 
dx op, dx oy, 


o 
Ca, 


along each extremal. 


LSS eeanen Gi = 1,...,m) 


Proof. We have to show that 


« (2) =0 (i =1,...,m) (74) 


along each extremal. Calculating the left-hand side of (74), we find 
that 


d (25) - as $ as dy, 


dx \da,) — dxda, * 2; dy, 0a, dx (75) 


Substituting (73) into the Hamilton-Jacobi equation (72), and 
differentiating the result with respect to «,, we obtain 


eS > 0H &S 


Ox Gor, 7 ke 1 i. OY, Baty us 


Then substitution of (76) into (75) gives 


d __ vy 0H _@s % 8S dy, 
dx i (ja) ya Pr OV, Ox; oe Oy, Ox, ax 


a, 
Oy, Oa, \dx Op, 


Since 
eee ee = Mes sex) 


along each extremal, it follows that (74) holds along each extremal, 
which proves the theorem. 
THEOREM 2 (Jacobi). Let 
S = SOV +s Vas Cy + +9 hn) (77) 


be a complete integral of the Hamilton-Jacobi equation (72), i.e., a general 
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solution of (72) depending on n parameters «,,...,%,. Moreover, let the 
determinant of the n x n matrix 

as 
Ou, OY, 


(78) 


be nonzero, and let B,,...,8, be n arbitrary constants. Then the 
functions 


Mi = y(X, 01,.--5 &ny Bi, ---» Bn) = 1, +52) (79) 
defined by the relations 
7) 
an S(X, Vis- sey Vay Cty Oe = By C= 1,...,n), (80) 
i 


together with the functions 


a ; 
Pi = Fy SO Vas +s Vns Say + En) (i = 1,...,n), (81) 
yi 
where the y, are given by (79), constitute a general solution of the canonical 
system 


de Op Be = ey @) 


Proof 1. According to Theorem 1, the n relations (80) correspond 
to first integrals of the canonical system (82). To obtain the general 
solution of (82), we first use (80) to define the n functions (79) [this is 
possible since (78) has a nonvanishing determinant], and then use (81) 
to define the » functions p;. To show that the functions y, and p, so 
defined actually satisfy the canonical equations (82), we argue as follows: 
Differentiating (80) with respect to x, where the y, are regarded as 
functions of x [cf. (79)], we obtain 


d (28) - as 2 8S dy & aS (2 aN 
+2. 25 


dx \da,) ax Ga, creas Oy, Ou, dx 00, \dx  Op,) 


where in the last step we have used (76). Since the determinant of the 
matrix (78) is nonzero, it follows that 


dy _ 0H 
De op, (= 1,...,7), (83) 


which is just the first set of equations (82). 
Next, we differentiate (81) with respect to x, obtaining 


dx dx 


dp, _d(aS\_ &S eS dy, OS aS aH 
dx + > ao 1, OY, AX — Ox dy, + +> 


~ dx \ay,) ~~ ax ay, OV OY: OP 
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where we have used (83). Then, taking account of (8!) and differ- 
entiating the Hamilton-Jacobi equation (72) with respect to )y,, we 
find that 


@S eH & eH @S 


ax ey, a, 4 Op, dy, OH 


A comparison of the last two equations shows that 


Ge oye G@ =1,...,n), 


which is just the second set of equations (82). 


Proof 2. Our second proof of Jacobi’s theorem is based on the use 
of a canonical transformation. Let (77) be a complete integral of the 
Hamilton-Jacobi equation. We make a canonical transformation of 
the system (82), choosing the function (77) as the generating function, 


01,---, 4, aS the new momenta (cf. footnote 15, p. 86), and B,,..., By 
as the new coordinates. Then, according to formula (41) of Sec. 19, 
_ os ma) bo as 
i= ay, B; = a, H* =H+ ox 


But since the function S satisfies the Hamilton-Jacobi equation, we 
have 


ian a5: 
Ox 


Therefore, in the new variables, the canonical equations become 


da, ap; _ 
woe ae 


from which it follows that «, = const, 8; = const along each extremal. 
Thus, we again obtain the same z first integrals 


os 
aa, — 


of the system of Euler equations. If we now use these equations to de- 
termine the functions (79) of the 2” parameters «,,...,%,,Bi,.--, Br, 
and if, as before, we set 
poe St Vig ce as Vay hry os +) Sn) 
oy, > > > no > ’ n/> 
where the ), are given by (79), we obtain 2n functions 


yilx, Aiy-- +9 Xn, Ba, * 9749, Br)s 
Dilx, &1,---, ns Bir ++ Bas 


which constitute a general solution of the canonical system (82). 
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PROBLEMS 


1. Use the canonical Euler equations to find the extremals of the functional 


f Vx? + 2 V1 + y@ dx, 
and verify that they agree with those found in Chap. 1, Prob. 22. 
Hint. The Hamiltonian is 
H(x, ¥,p) = — Vx? + y? — p3, 


and the corresponding canonical system 


has the first integral 


where C is a constant. 


2. Consider the action functional 
=! f (aye — xx? 
J(x] = an (mx xx?) dt 


corresponding to a simple harmonic oscillator, i.e., a particle of mass m 
acted upon by a restoring force —xx (cf. Sec. 36.2). Write the canonical 
system of Euler equations corresponding to J[x], and interpret them. Calcu- 
late the Poisson brackets [x, p], [x, H] and [p, H]. Is p a first integral of 
the canonical Euler equations? 


3. Use the principle of least action to give a variational formulation of the 
problem of the plane motion of a particle of mass m attracted to the origin 
of coordinates by a force inversely proportional to the square of its distance 
from the origin. Write the corresponding equations of motion, the Hamil- 
tonian and the canonical system of Euler equations. Calculate the Poisson 
brackets [r, p,], [9, pal, [p,, H] and [pe, H], where 


aL _ ak 
Pr= GF’ Po 


Is pe a first integral of the canonical Euler equations? 


Hint. The action functional is 
¢ . 
Jir, 8] = i * E (7? + 7262) + 4 dt, 
to r 


where & is a constant, and r, 8 are the polar coordinates of the particle. 
4. Verify that the change of variables 
Y= Dy P=y 


is a canonical transformation, and find the corresponding generating function. 
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5. Verify that the functional J[r, 0] of Prob. 3 is invariant under rotations, 
and use Noether’s theorem (in polar coordinates) to find the corresponding 
conservation law. What geometric fact does this law express? 

Ans. The line segment joining the particle to the origin sweeps out equal 
areas in equal times. 


6. Write and solve the Hamilton-Jacobi equation corresponding to the 
functional 


Jbl= [yds 


To 


and use the result to determine the extremals of J[y). 
Ans. The Hamilton-Jacobi equation is 


5 2 
aZ + (BY =0. 
= dy 


7. Write and solve the Hamilton-Jacobi equation corresponding to the 
functional 


Jy] = Prov + y? dx, 


and use the result to find the extremals of J[y). 
Ans. The Hamilton-Jacobi equation is 


oS\? éS\? 
(+) = ro 
with solution 
YY ——— 
S = ax + i Vf%Xn) — oda + B. 
vo 


The extremals are 


= const. 


eo i dq 
v0 VP) — a 
8. Use the Hamilton-Jacobi equation to find the extremals of the functional 
of Prob. 1. 
Hint. Try a solution of the form S = $(Ax? + 2Bxy + Cy?). 


9. What functional leads to the Hamilton-Jacobi equation 


Fal (5) 
s-} + [=] = 1? 
ex oy 


10. Prove that the Hamilton-Jacobi equation can be solved by quadratures 
if it can be written in the form 


AY es es = 
o(, =) + r(», zs) = 0. 


11. By a Liouville surface is meant a surface on which the arc-length 
functional has the form 


Ji) = is Voix) + GY) VI + y? dx, 
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Prove that the equations of the geodesics on such a surface are 


d d 
Ivee ed 7 laa +a 


where « and @ are constants. Show that surfaces of revolution are Liouville 
surfaces. 


5 


THE SECOND VARIATION. 
SUFFICIENT CONDITIONS 
FOR A WEAK EXTREMUM 


Until now, in studying extrema of functionals, we have only considered 
a particular necessary condition for a functional to have a weak (relative) 
extremum for a given curve y, i.e., the condition that the variation of the 
functional vanish for the curve y. In this chapter, we shall derive sufficient 
conditions fora functional to have a weak extremum. To find these sufficient 
conditions, we must first introduce a new concept, namely, the second 
variation of a functional. We then study the properties of the second varia- 
tion, and at the same time, we derive some new necessary conditions for an 
extremum. 

As will soon be apparent, there exist sufficient conditions for an extremum 
which resemble the necessary conditions and are easy to apply. These 
sufficient conditions differ from the necessary conditions (also derived in 
this chapter) in much the same way as the sufficient conditions y’ = 0, 
y” > 0 for a function of one variable to have a minimum differ from the 
corresponding necessary conditions y’ = 0, y” > 0. 


24. Quadratic Functionals. The Second Variation of a Functional 


We begin by introducing some general concepts that will be needed later. 


A functional B[x, y] depending on two elements x and y, belonging to some 
97 
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normed linear space &, is said to be bilinear if it is a linear functional of y for 
any fixed x and a linear functional of x for any fixed y (cf. p. 8). Thus, 


Bix + y, z] = Blx, z] + Bb, Zz], 
Blax, y] = «B[x, y], 
and 
Blx, y + z] = Bx, y] + BLx, z], 
Bix, ay] = aBlx, y] 


for any x, y,z¢€@ and any real number «. 

If we set y = x in a bilinear functional, we obtain an expression called 
a quadratic functional. A quadratic functional A[x] = B[x, x] is said to be 
positive definite: if A[x] > 0 for every nonzero element x. 

A bilinear functional defined on a finite-dimensional space is called a 
bilinear form. Every bilinear form B[x, y] can be represented as 


n 


Bix, y] = bie» 


t,k=1 


where &,,...,&, and 7,..., 4, are the components of the ‘“‘vectors” x and y 
relative to some basis.2_ If we set y = x in this expression, we obtain a 
quadratic form 


Alx] = Blxyy) = > bikie. 
i,k=1 


Example 1. The expression 


Bix, y}] = f° xy at 


is a bilinear functional defined on the space @ of all functions which are 
continuous in the interval a < t < b. The corresponding quadratic func- 
tional is 


2b 
Alx] = | x%(0) dt. 
Example 2. A more general bilinear functional defined on @ is 


Bx, y= fF alt)x(s)y(0) at, 


where a(t) is a fixed function. If a(t) > 0 for all ¢ in [a, 5), then the corre- 
sponding quadratic functional 


b 
Ax] = | a(t)x*(0) de 
is positive definite. 


1 Actually, the word “definite” is redundant here, but will be retained for traditional 
reasons. Quadratic functionals A[x] such that A[x] > 0 for all x will simply be called 
nonnegative (see p. 103 ff.). 

2 See e.g., G. E. Shilov, op. cit., p. 114. 
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Example 3. The expression 
Abd = [ex + BOX" + (Ox) at 


is a quadratic functional defined on the space Q, of all functions which are 
continuously differentiable in the interval [a, 5]. 


Example 4, The integral 
b pb 
Bix, yl = [| K(s, Ox(3) (0) ds dt, 


where K(s, t) is a fixed function of two variables, is a bilinear functional 
defined on. Replacing y(t) by x(t), we obtain a quadratic functional. 


We now introduce the concept of the second variation (or second differential) 
of a functional. Let J[y] be a functional defined on some normed linear 
space @ In Chapter 1, we called the functional J[y) differentiable if its 
increment 

AJ[h] = Jb + 4] - Jb) 


can be written in the form 
AJ{h] = 9[h] + ela, 


where 9[A] is a linear functional and «> 0 as |A|| > 0. The quantity ¢9[h] 
is the principal linear part of the increment AJ[A], and is called the (first) 
variation [or (first) differential] of J[y], denoted by 8J[A]. 

Similarly, we say that the functional J[y] is twice differentiable if its incre- 
ment can be written in the form 


AJ{h] = 9,[h] + galh] + e||All?, 


where ¢, [A] is a linear functional (in fact, the first variation), p.[A] is a quad- 
ratic functional, and «> 0 as ||A|| > 0. The quadratic functional ¢,[h] is 
called the second variation (or second differential) of the functional J[y], 
and is denoted by 8*/[A].2 From now on, it will be tacitly assumed that we 
are dealing with functionals which are twice differentiable. The second 
variation of such a functional is uniquely defined. This is proved in just 
the same way as the uniqueness of the first variation of a differentiable 
function (see Theorem | of Sec. 3.2). 


THEOREM |. A necessary condition for the functional J[y] to have a 
minimum for y = jf is that 
sV[y] > 0 (1) 


for y = f§ and all admissible h. For a maximum, the sign > in (1) is 
replaced by <. 


3 The comment made in footnote 6, p. 12 applies here as well. 
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Proof. By definition, we have 
AJ[A] = 8J[A] + 877 [A] + ellAll?, (2) 
where ¢ > Oas ||A|| > Q. According to Theorem 2 of Sec. 3.2, 8/[h] = 0 
for y = y and all admissible A, and hence (2) becomes 
AJ[A] = 8°J[A] + ellAll?. (3) 


Thus, for sufficiently small ||A||, the sign of A/[A] will be the same as the 
sign of 87J[h]. Now suppose that 87J[h)] < 0 for some admissible 
ho. Then for any a # 0, no matter how small, we have 


87J [ato] = «737 [ho] < 0. 
Hence, (3) can be made negative for arbitrarily small |||. But this is 
impossible, since by hypothesis /[y] has a minimum for y = ¥, i.e., 
AJ[h] = J[P + A] — JP] = 0 
for all sufficiently small ||/||. This contradiction proves the theorem. 


The condition 5?/[h] > 0 is necessary but of course not sufficient for the 
functional J[y] to have a minimum for a given function. To obtain a 
sufficient condition, we introduce the following concept: We say that a 
quadratic functional [A] defined on some normed linear space ¥ is strongly 
positive if there exists a constant k > 0 such that 

Palh] > k|lAll? 
for all /.* 


THEOREM 2. A sufficient condition for a functional J[y] to have a mini- 
mum for y = §, given that the first variation 8J[h] vanishes for y = f, 
is that its second variation 8°J[h) be strongly positive for y = $. 


Proof. For y = y, we have 8J/[A] = 0 for all admissible 4, and hence 
AJ [A] = 87J [A] + el{All?, 
where < -> 0 as |A|| > 0. Moreover, for y = f, 
Sh] > k |All’, 


where k = const > 0. Thus, for sufficiently small «1, |e| < $k if 
|A|| < €,. It follows that 


AJ[h] = 82/[A] + ellhl|? > 4k Al? > 0 


if ||4|| < e,, i.e, J[y] has a minimum for y = /, as asserted. 


4 In a finite-dimensional space, strong positivity of a quadratic form is equivalent to 
Positive definiteness of the quadratic form. Therefore, a function of a finite number of 
variables has a minimum at a point P where its first differential vanishes, if its second 
differential is positive at P. In the general case, however, strong positivity is a stronger 
condition than positive definiteness. 
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25. The Formula for the Second Variation. Legendre’s Condition 


Let F(x, y, z) be a function with continuous partial derivatives up to 
order three with respect to all its arguments. (Henceforth, similar smooth- 
ness requirements will be assumed to hold whenever needed.) We now 
find an expression for the second variation in the case of the simplest varia- 
tional problem, i.e., for functionals of the form 


6 
JD) = | Fe yy) dx, (4) 
defined for curves y = y(x) with fixed end points 


yWa)= A, yb) = B. 
First, we give the function y(x) an increment A(x) satisfying the boundary 
conditions 

h(a) = 0, h(b) = 0. (5) 
Then, using Taylor’s theorem with remainder, we write the increment of the 
functional J[y] as 
AJ[A] = J[y + A] - Jy) 


b | iy es = = (6) 
= [ (FA + Fyh’) dx + 5 fo Fgh? + 2Fyyhh! + Fy yh") dx, 


where, as usual, the overbar indicates that the corresponding derivatives are 
evaluated along certain intermediate curves, i.e., 
Fy, = Fy(x, y + 0h, y' + 04) (0 <6 < 1), 
and similarly for F,,- and F,,.,.. 
If we replace F,,, F,, and F,.,- by the derivatives F,,, Fy, and F,.,. eval- 
uated at the point (x, y(x), y‘(x)), then (6) becomes 


t) b 
AJ [A] = : (FA + Fy’) dx + 5f. (Fyyh? + 2Fyyhh’ + Fyyh'®)dx +, (7) 
where « can be written as 


[° (eh? + ehh! + egh!2) dx. (8) 


yy 
that €,, €2, ¢3 > 0 as ||/'|, > 0, from which it is apparent that < is an infinites- 


imal of order higher than 2 relative to |A|/?. The first term in the right- 
hand side of (7) is 8J[A], and the second term, which is quadratic in A, is 
the second variation 87/[h]. Thus, for the functional (4) we have 


d 
2 


Because of the continuity of the derivatives F,,, Fy, and F,.,., it follows 


b 
SU [h] = [ (F,h2 + 2Fyyhh’ + Fyyh') dx. (9) 
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We now transform (9) into a more convenient form. Integrating by parts 
and taking account of (5), we obtain 


6 bo 
[2k wht! dx = — J (5 Fyy)h i 
Therefore, (9) can be written as 


sth] = [ ” (Ph’? + Oh?) dx, (10) 


P=PO)=5Fry, 9 = 00) =3(Fw - 5 Fw) (1) 


This is the expression for the second variation which will be used below. 

The following consequence of formulas (7) and (8) should be noted. If 
J(y] has an extremum for the curve y = p(x), and if y = y(x) + A(x) is an 
admissible curve, then 


Asth] = f "(Ph’? + Qh?) dx + [ ” Eh? + nh?) dx, (12) 


where &, 1 > Oas ||A||, > 0. In fact, since J[y] has an extremum for y = y(x), 
the linear terms in the right-hand side of (7) vanish, while the quantity (8) 
can be written in the form 


[G2 + nh!) ax 


by integrating the term ehh’ by parts and using the boundary conditions (5). 
Formula (12) will be used later, when we derive sufficient conditions for a 
weak extremum (see Sec. 27). 

It was proved in Sec. 24 that a necessary condition for a functional J[y] to 
have a minimum is that its second variation 87/[h] be nonnegative. In the 
case of a functional of the form (4), we can use formula (10) to establish a 
necessary condition for the second variation to be nonnegative. The argu- 
ment goes as follows: Consider the quadratic functional (10) for functions 
A(x) satisfying the condition A(a) = 0. With this condition, the function 
A(x) will be small in the interval [a, 5] if its derivative A’(x) is small in [a, 5]. 
However, the converse is not true, i.e., we can construct a function A(x) 
which is itself small but has a large derivative /’(x) in [a, 6]. This implies 
that the term Ph’? plays the dominant role in the quadratic functional (10), 
in the sense that Ph’? can be much larger than the second term Qh? but it 
cannot be much smaller than Qh? (it is assumed that P # 0). Therefore, 
it might be expected that the coefficient P(x) determines whether the func- 
tional (10) takes values with just one sign or values with both signs. We now 
make this qualitative argument precise: 
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LEMMA. A necessary condition for the quadratic functional 
b 

827 [h] = | (Ph'? + Qh?) dx, (13) 
defined for all functions h(x) € 2,(a, b) such that h(a) = hb) = 0, to be 
nonnegative is that 

P(x)2>0 (@ex<b). (14) 

Proof. Suppose (14) does not hold, i.e., suppose (without loss of 
generality) that P(x») = —28 (8 > 0) at some point Xp in [a, 6]. Then, 
since P(x) is continuous, there exists an a > 0 such that a < x) — a, 
Xp + « < b, and 
P(x) < — 8 (% —%< xX <x + 2). 


We now construct a function A(x) € Z,(a, 6) such that the functional (13) 
is negative. In fact, let 


. 9 %T(X — Xo) 
sin? ———~ for x»-a<x<x+4, 
A(x) = Ps 7 : (15) 
0 otherwise. 
Then we have 
a 2 _ 
[che + One dx = [°"* PE sin? 72 — 29) gy (16) 
a Io-a& a a 
Z, a a 2 
ge gy et Oe) je PE pigs 
Zo -@ a a 


where 
M= max |Q(x)j. 
a<cr<b 


For sufficiently small «, the right-hand side of (16) becomes negative, 
and hence (13) is negative for the corresponding function A(x) defined 
by(15). This proves the lemma. 


Using the lemma and the necessary condition for a minimum proved in 
Sec. 24, we immediately obtain 


THEOREM (Legendre). A necessary condition for the functional 


JUl=[' Fay dy y¥@= 4, 6) = 8B 


to have a minimum for the curve y = y(x) is that the inequality 
F, yy’ 2 0 
(Legendre’s condition) be satisfied at every point of the curve. 


Legendre attempted (unsuccessfully) to show that a sufficient condition 
for J[y] to have a (weak) minimum for the curve y = p(x) is that the strict 
inequality 

Fyy > 0 (17) 
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(the strengthened Legendre condition) be satisfied at every point of the curve. 
His approach was to first write the second variation (10) in the form 


b 
SU [h] = [Ph’? + 2whh’ + (Q + wh?) dx, (18) 
where w(x) is an arbitrary differentiable function, using the fact that 
= 2 d 2 — 14,2 U 
= i 4, (wh?) dx = if (w'h? + 2whh’) dx, (19) 


since h(a) = h(b) = 0. Next, he observed that the condition (17) would 
indeed be sufficient if it were possible to find a function w(x) for which the 
integrand in (18) is a perfect square. However, this is not always possible, 
as was first shown by Legendre himself, since then w(x) would have to 
satisfy the equation 


P(Q + w’) = w?, (20) 


and although this equation is ‘‘locally solvable,” it may not have a solution 
in a sufficiently large interval.® 
Actually, the following argument shows that the requirement that 


Fyy[x, yx), ¥'()] > 0 (21) 


be satisfied at every point of an extremal y = y(x) cannot be a sufficient 
condition for the extremal to be a minimum of the functional J[y]. The 
condition (21), like the condition 


d 


aa 


Fy =90 

characterizing the extremal is of a “local” character, i.e., it does not pertain 
to the curve as a whole, but only to individual points of the curve. Therefore, 
if the condition (21) holds for any two curves AB and BC, it also holds for 
the curve AC formed by joining 4B and BC. On the other hand, the fact 
that a functional has an extremum for each part AB and BC of some curve 
AC does not imply that it has an extremum for the whole curve AC. For 
example, a great circle arc on a given sphere is the shortest curve joining 
its end points if the arc consists of less than half a circle, but it is not the 
shortest curve (even in the class of neighboring curves) if the arc consists of 
more than half a circle. However, every great circle arc on a given sphere 
is an extremal of the functional which represents arc length on the sphere, 
and in fact it is easily verified that for this functional, (21) holds at every 
point of the great circle arc. Therefore, (21) cannot be a sufficient condition 


5 For example, if P = —1, Q = 1, we obtain the equation w’ + | + w? = 0, so that 
w(x) = tan(c — x). If 6 — a@> =, there is no solution in the whole interval [a, }], 
since then tan (¢c — x) must become infinite somewhere in [a, 4]. 
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for an extremum, nor, for that matter, can any set of purely local conditions 
be sufficient. 

Although the condition (20) does not guarantee a minimum, the idea of 
completing the square of the integrand in formula (18) for the second varia- 
tion, with the aim of finding sufficient conditions for an extremum, turns 
out to be very fruitful. In fact, the differential equation (20), which comes 
to the fore when trying to implement this idea, leads to new necessary 
conditions for an extremum (which are no longer local!). We shall discuss 
these matters further in the next two sections. 


26. Analysis of the Quadratic Functional f (Ph’? + Qh?) dx 


As shown in the preceding section, to pursue our study of the ‘‘simplest”’ 
variational problem, i.e., that of finding the extrema of the functional 


JO = [Fos % 9) de, (22) 
where 
ya)=A, Wd) =B, 
we have to analyze the quadratic functional® 


[ er? + ofA) ax, (23) 


defined on the set of functions A(x) satisfying the conditions 
h(a) =0, A(b)=0. (24) 


Here, the functions P and Q are related to the function F appearing in the 
integrand of (22) by the formulas 


I 1 d 
P= 2 Fyys Q= 5) (Fu ~ dx Fy) (25) 


For the time being, we ignore the fact that (23) is a second variation, satisfying 
the relations (25), and instead, we treat the analysis of (23) as an independent 
problem, in its own right. 

In the last section, we saw that the condition 


Pix) 20 (agx<b) 


is necessary but not sufficient for the quadratic functional (23) to be 20 
for all admissible /i(x). In this section, it will be assumed that the strength- 
ened inequality 


P(x) > 0 (a<x< b) 


® Similarly, the study of extrema of functions of several variables (in particular, the 
derivation of sufficient conditions for an extremunt) involves the analysis of a quadratic 
form (the second differential). 
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holds. We then proceed to find conditions which are both necessary and 
sufficient for the functional (23) to be >0 for all admissible A(x) # 0, ie., 
to be positive definite. We begin by writing the Euler equation 


ae 
— £ (Ph) + Oh =0 (26) 


corresponding to the functional (23).? This is a linear differential equation 
of the second order, which is satisfied, together with the boundary conditions 
(24), or more generally, the boundary conditions 


h(a) = 0, h(c) = 0, (a<c< b), 


by the function A(x) = 0. However, in general, (26) can have other, non- 
trivial solutions satisfying the same boundary conditions. In this connection, 
we introduce the following important concept: 


DEFINITION. The point @ (#a) is said to be conjugate to the point a if 
the equation (26) has a solution which vanishes for x = a and x = @ but 
is not identically zero. 


Remark. If h(x) is a solution of (26) which is not identically zero and 
satisfies the conditions h(a) = h(c) = 0, then CA(x) is also such a solution, 
where C = const # 0. Therefore, for definiteness, we can impose some kind 
of normalization on A(x), and in fact we shall usually assume that the con- 
stant C has been chosen to make h(a) = 1.8 


The following theorem effectively realizes Legendre’s idea, mentioned on 
p. 104. 


THEOREM |. Jf 
P(x) > 0 (a<x< 5b), 


and if the interval [a, b] contains no points conjugate to a, then the quad- 
ratic functional 


b 
i (Ph'? + Qh?) dx (27) 
is positive definite for all h(x) such that h(a) = h(b) = 0. 


7 It must not be thought that this is done in order to find the minimum of the functional 
(23). In fact, because of the homogeneity of (23), its minimum is either 0 if the func- 
tional is positive definite, or —oo otherwise. In the latter case, it is obvious that the 
minimum cannot be found from the Euler equation. The importance of the Euler 
equation (26) in our analysis of the quadratic functional (23) will become apparent in 
Theorem |. The reader should also not be confused by our use of the same symbol 
h(x) to denote both admissible functions, in the domain of the functional (23), and 
solutions of equation (26). This notation is convenient, but whereas admissible func- 
tions must satisfy A(a) = A(6) = 0, the condition A(6) = 0 will usually be explicitly 
precluded for nontrivial solutions of (26). 

8 If A(x) # 0 and A(a) = 0, then f/’(a) must be nonzero, because of the uniqueness 
theorem for the linear differential equation (26). See e.g., E. A. Coddington, An 
Introduction to Ordinary Differential Equations, Prentice-Hall, Inc., Englewood Cliffs, 
New Jersey (1961), pp. 105, 260. 
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Proof. The fact that the functional (27) is positive definite will be 
proved if we can reduce it to the form 


[, PERE) dx, 


where ¢(- - -) is some expression which cannot be identically zero unless 
h(x) = 0. To achieve this, we add a quantity of the form d(wh?) to the 
integrand of (27), where w(x) is a differentiable function. This will 
not change the value of the functional (27), since h(a) = h(b) = O implies 
that 


b 
| d(wh?) dx = 0 
(cf. equation (19)]. 
We now select a function w(x) such that the expression 
Ph? + Qh? + 4 (wh?) = Ph’? + 2whh' + (Q + w')h? — (28) 
is a perfect square. This will be the case if w(x) is chosen to be a 
solution of the equation 
P(O + w’) = w? (29) 
(cf. equation (20)]. In fact, if (29) holds, we can write (28) in the form 
ny WY 
P(h % ph) 


Thus, if (29) has a solution defined on the whole interval (a, 5], the quad- 
ratic functional (27) can be transformed into 


b w 2 
i PH + 5h) dx, (30) 


and is therefore nonnegative. 
Moreover, if (30) vanishes for some function h(x), then obviously 


h(x) + Bx) = 0, (31) 


since P(x) > O for a = x = b. Therefore the boundary condition 
h(a) = 0 implies h(x) = O, because of the uniqueness theorem for 
the first-order differential equation (31). It follows that the functional 
(30) is actually positive definite. 

Thus, the proof of the theorem reduces to showing that the absence of 
points in [a, 6] which are conjugate to a guarantees that (29) has a solution 
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defined on the whole interval [a, 6]. Equation (29) is a Riccati 
equation, which can be reduced to a linear differential equation of the 
second order by making a change of variables. In fact, setting 


u 
w= —- 7 (32) 
where u is a new unknown function, we obtain the equation 
= py 4 Ou=6 (33) 
dx" ~ 


which is just the Euler equation (26) of the functional (27). If there are 
no points conjugate to a in [a, b], then (33) has a solution which does not 
vanish anywhere in [a, 5],° and then there exists a solution of (29), 
given by (32), which is defined on the whole interval [a, b]. This com- 
pletes the proof of the theorem. 


Remark. The reduction of the quadratic functional (27) to the form (30) 
is the continuous analog of the reduction of a quadratic form to a sum of 
squares. The absence of points conjugate to a in the interval [a, 5] is the 
‘analog of the familiar criterion for a quadratic form to be positive definite. 
This connection will be discussed further in Sec. 30. 

Next, we show that the absence of points conjugate to a in the interval 
[a, b] is not only sufficient but also necessary for the functional (27) to be 
positive definite. 


Lemma. If the function h = h(x) satisfies the equation 
dy . 
ae (Ph') + Qh =0 


and the boundary conditions 


h(a) = h(b) = 0, (34) 
then 


b 
i (Ph'? + Qh?) dx = 0. 
Proof. The lemma is an immediate consequence of the formula 
o=f' [- 4 (phy + nls dx = [° (PH? + Qh?) dx 
<= a dx = a 3 
which is obtained by integrating by parts and using (34). 


° If the interval [a, 5] contains no points conjugate to a, then, since the solution of the 
differential equation (26) depends continuously on the initial conditions, the interval 
[a, 6] contains no points conjugate to a — ec, for some sufficiently small «. Therefore, 
the solution which satisfies the initial conditions A(a — e) = 0, A’(a — €) = 1 does not 
vanish anywhere in the interval [a, 5]. Implicit in this argument is the assumption that 
P does not vanish in [a, 8]. 
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THEOREM 2. If the quadratic functional 


[ en + on) dx, (35) 
where 
P(x) > 0 (a<x <b), 


is positive definite for all h(x) such that h(a) = h(b) = 0, then the interval 
[a, 5] contains no points conjugate to a. 


Proof. The idea of the proof is the following: We construct a family 
of positive definite functionals, depending on a parameter ft, which for 
t= 1 gives the functional (35) and for ¢ = 0 gives the very simple 
quadratic functional 


i h’? dx, 


for which there can certainly be no points in [a, 6] conjugate to a. 
Then we prove that as the parameter f¢ is varied continuously from 0 
to 1, no conjugate points can appear in the interval [a, 5]. 

Thus, consider the functional 


I ” IPH? + Qh) + (1 — Nh!] dx, (36) 


which is obviously positive definite for all 1 O<+¢f<_ 1, since (35) 
is positive definite by hypothesis. The Euler equation corresponding to 
(36) is 


— 4 (uP + (1 — t))h'} + (Qh = 0. (37) 


Let A(x, 1) be the solution of (37) such that A(a, t) = 0, Aa, t) = 1 for 
allr,0 <4< 1. This solution is a continuous function of the parameter 
t, which for ¢ = 1 reduces to the solution A(x) of equation (26) satisfying 
the boundary conditions /A(a) = 0, h’(a) = 1, and for t = O reduces to the 
solution of the equation h” = 0 satisfying the same boundary conditions, 
i.e., the function A = x —a. We note that if A(x, fo) = 0 at some 
point (x9, fo), then A,(x, fo) # 0. In fact, for any fixed ¢, A(x, r) satisfies 
(37), and if the equations /(x9, fo) = 0, A(Xo, to) = O were satisfied simul- 
taneously, we would have /i(x, fo) = 0 for all x, a < x < 5b, because of 
the uniqueness theorem for linear differential equations. But this is 
impossible, since f(a, t) = 1 forall t,O <t < 1. 

Suppose now that the interval [a,b] contains a point @ conjugate 
to a, i.e., suppose that A(x, 1) vanishes at some point x = d@ in [a, 5]. 
Then @ # 8, since otherwise, according to the lemma, 


[" (Ph? + Oh?) dx = 0 


for a function /(x) # 0 satisfying the conditions A(a) = (6) = 0, 
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which would contradict the assumption that the functional (35) is positive 
definite. Therefore, the proof of the theorem reduces to showing that 
(a, b] contains no interior point d conjugate to a. 


FIGURE 7 


To prove this, we consider the set of all points (x, t),a < x < 5, 
satisfying the condition A(x, t) = 0.!° This set, if it is nonempty, 
represents a curve in the xf-plane, since at each point where A(x, t) = 0, 
the derivative A,(x, ft) is different from zero, and hence, according to the 
implicit function theorem, the equation A(x, t) = 0 defines a continuous 
function x = x(t) in the neighborhood of each such point.1! By 
hypothesis, the point (4, 1) lies on this curve. Thus, starting from the 
point (a, 1), the curve (see Figure 7) 


A. Cannot terminate inside the rectangle a< x < 6,0 <t< 1, 
since this would contradict the continuous dependence of the 
solution A(x, t) on the parameter f; 


B. Cannot intersect the segment x = 6,0 < ¢ < 1, since then, by 
exactly the same argument as in the lemma [but applied to equation 
(37), the boundary conditions h(a, t) = h(6, t) = 0 and the func- 
tional (36)], this would contradict the assumption that the functional 
iS positive definite for all ¢; 


C. Cannot intersect the segment a < x < 6, t = 1, since then for 
some ¢ we would have A(x, t) = 0, h,(x, t) = 0 simultaneously; 


D. Cannot intersect the segment a < x < 6b, t = 0, since for 
t = 0, equation (37) reduces to h” = 0, whose solutiond = x —a 
would only vanish for x = a; 


E. Cannot approach the segment x = a,0 < ¢ < 1, since then for 
some t we would have h,(a, 1) = 0 [why?], contrary to hypothesis. 


10 Recall that A(a,t) = Oforallr,O <r <1. 
11 See e.g., D. V. Widder, op. cit., p. 56. See also footnote 8, p. 47. 
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It follows that no such curve can exist, and hence the proof is 
complete. 


If we replace the condition that the functional (35) be positive definite 
by the condition that it be nonnegative for all admissible A(x), we obtain 
the following result: 


THEOREM 2’. If the quadratic functional 


[ en? + on) dx (38) 


where 
P(x) > 0 (a<x<b) 


is nonnegative for all h(x) such that h(a) = h(b) = 0, then the interval 
[a, 6] contains no interior points conjugate to a.'? 


Proof. If the functional (38) is nonnegative, the functional (36) is 
positive definite for all t except possibly t= 1. Thus, the proof of 
Theorem 2 remains valid, except for the use of the lemma to prove that 
& = b is impossible. Therefore, with the hypotheses of Theorem 2’, 
the possibility that @ = 6b is not excluded. 


Combining Theorems | and 2, we finally obtain 
THEOREM 3. The quadratic functional 
b 
i) (Ph? + Qh?) dx, 


where 
Pix)>O (a<x< 5b), 


is positive definite for all h(x) such that h(a) = h(b) = 0 if and only if 
the interval [a, 6] contains no points conjugate to a. 


27. Jacobi’s Necessary Condition. More on Conjugate Points 


We now apply the results obtained in the preceding section to the simplest 
variational problem, i.e., to the functional 


f F(x, y, y') dx (39) 
with the boundary conditions 
ya)= A, yd) = B. 
12 In other words, the solution of the equation 
—4 (Pr) + Qh=0 


satisfying the initial conditions A(a) = 0, #’(a) = 1 does not vanish at any interior point 
of the interval [a, 4]. 
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It will be recalled from Sec. 25 that the second variation of the functional 
(39) [in the neighborhood of some extremal y = y(x)] is given by 


ts) 
(Ph'? + Qh?) dx (40) 
where 
1 1 d 
P=sFry O=35(Fu- ZF} (41) 
DEFINITION |. The Euler equation 
d ’ _ 
a (Ph') + Qh =0 (42) 


of the quadratic functional (40) is called the Jacobi equation of the original 
functional (39). 


DEFINITION 2. The point & is said to be conjugate to the point a with 
respect to the functional (39) if it is conjugate to a with respect to the 
quadratic functional (40) which is the second variation of (39), (.e., if it 
is conjugate to a in the sense of the definition on p. 106. 


THEOREM (Jacobi’s necessary condition). If the extremal y = y(x) 
corresponds to a minimum of the junctional 


eb 
| Fou yy) dx, 


and if 
Fyy > 0 


along this extremal, then the open interval (a, 6) contains no points con- 
jugate to a8 


Proof. In Sec. 24 it was proved that nonnegativity of the second 
variation is a necessary condition fora minimum. Moreover, according 
to Theorem 2’ of Sec. 26, if the quadratic functional (40) is nonnegative, 
the interval (a, 5) can contain no points conjugate to a. The theorem 
follows at once from these two facts taken together. 


We have just defined the Jacobi equation of the functional (39) as the Euler 
equation of the quadratic functional (40), which represents the second 
variation of (39). We can also derive Jacobi’s equation by the following 
argument: Given that y = y(x) is an extremal, let us examine the conditions 
which have to be imposed on /(x) if the varied curve y = y*(x) = p(x) + A(x) 
is to be an extremal also. Substituting y(x) + A(x) into Euler’s equation 


Fix, y + hy y’ +h’) - a F(xythy+h)=0, 


13 Of course, the theorem remains true if we replace the word “minimum” by 
“maximum” and the condition F,-,- > 0 by Fy, < 0. 
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using Taylor’s formula, and bearing in mind that y(x) is already a solution 
of Euler’s equation, we find that 


Fyh + Fyyh' — F (Fh + Fyyh') = o(h), 


where o(A) denotes an infinitesimal of order higher than 1 relative to A and 
its derivative. Neglecting o(h) and combining terms, we obtain the linear 
differential equation 


d d rn 
(Fy — FF dh — Gy Frvh’) = 0; 


this is just Jacobi’s equation, which we previously wrote in the form (42), 
using the notation (41). In other words, Jacobi’s equation, except for infini- 
tesimals of order higher than 1, is the differential equation satisfied by the 
difference between two neighboring (i.e., “‘infinitely close”) extremals. An 
equation which is satisfied to within terms of the first order by the 
difference between two neighboring solutions of a given differential equation 
is called the variational equation (of the original differential equation). 
Thus, we have just proved that Jacobi’s equation is the variational equation of 
Euler’s equation. 


Remark. These considerations are easily extended to the case of an 
arbitrary differential equation 


F(x, y, y’,...,¥”) = 0 (43) 


of order n. Let y(x) and y(x) + 5y(x) be two neighboring solutions of (43). 
Replacing y(x) by p(x) + Sy(x) in (43), using Taylor’s formula, and bearing 
in mind that y(x) satisfies (43), we obtain 


F,dy + Fy(dy)’ + ++ + Fyo(Syy™ +e=0, 


where « denotes a remainder term, which is an infinitesimal of order higher 
than | relative to 8y and its derivatives. Retaining only terms of the first 
order, we obtain the linear differential equation 


Fay + Fy(dyy' + +++ + Fyr@yy = 0, 


satisfied by the variation Sy; as before, this equation is called the variational 
equation of the original equation (43). For initial conditions which are 
sufficiently close to zero, this equation defines a function which is the 
principal linear part of the difference between two neighboring solutions of 
(43) with neighboring initial conditions. 


We now return to the concept of a conjugate point. It will be recalled 
that in Sec. 26 the point @ was said to be conjugate to the point a if A(@) = 0, 
where A(x) is a solution of Jacobi’s equation satisfying the initial conditions 
h(a) = 0, h'(a) = 1. As just shown, the difference z(x) = y*(x) — y(x) 
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corresponding to two neighboring extremals y = (x) and y = y*(x) drawn 
from the same initial point must satisfy the condition 


- Z (Pe) + Oz = 0), 


where o(z) is an infinitesimal of order higher than | relative to z and its 
derivative. Hence, to within such an infinitesimal, y*(x) — y(x) is a nonzero 
solution of Jacobi’s equation. This leads to another definition of a con- 
jugate point:1* 

DEFINITION 3. Given an extremal y = y(x), the point M = (4, y(a)) 
is said to be conjugate to the point M = (a, y(a)) if at M the difference 
y*(x) — y(x), where y = y*(x) is any neighboring extremal drawn from 
the same initial point M, is an infinitesimal of order higher than | relative 
to || y*(x) — Yx)I|1- 

Still another definition of a conjugate point is possible: 


DEFINITION 4. Given an extremal y = y(x), the point M = (4d, y(4)) 
is said to be conjugate to the point M = (a, y(a)) if M is the limit as 
| y*(x) — y(x) |, 0 of the points of intersection of y = y(x) and the 
neighboring extremals y = y*(x) drawn from the same initial point M. 


It is clear that if the point M/ is conjugate to the point M in the sense of 
Definition 4 (i.e., if the extremals intersect in the way described), then ‘7 is 
also conjugate to M in the sense of Definition 3. We now verify that the 
converse is true, thereby establishing the equivalence of Definitions 3 and 4. 
Thus, let y = (x) be the extremal under consideration, satisfying the initial 
condition 

ya) = A, 
and let y*(x) be the extremal drawn from the same initial point M = (a, A), 
satisfying the condition 
ye (a) — y'(@) = 4. 
Then y%(x) can be represented in the form 
Valx) = y(x) + ah(x) + ¢, 
where A(x) is a solution of the appropriate Jacobi equation, satisfying the 
conditions 
ha)=90, Aa) =1, 

and « is a quantity of order higher than | relative to «. 

Now let 


ha)=0, B= de 


14Jn stating this definition, we enlarge the meaning of a conjugate point to apply 
to points lying on an extremal and not just their abscissas. In all these consideralions, 
it is tacitly assumed that P = }F,.,- has constant sign along the given extremal y = y(x). 
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It is clear that A’(@) # 0, since A(x) # 0. Using Taylor’s formula, we can 
easily verify that for sufficiently small a, the expression 


yalx) — p(x) = ah(x) + € 


takes values with different signs at the points @—-f and @+ 8. Since 
8-0 as a0, this means that Af = (4, y(4)) is the limit as « > 0 of the 
points of intersection of the extremals y = y*(x) and the extremal y = y(x). 


Example. Consider the geodesics on a sphere, i.e., the great circle arcs. 
Each such arc is an extremal of the functional which gives arc length on the 
sphere. The conjugate of any point M on the sphere is the diametrically 
opposite point MM. In fact, given an extremal, a// extremals with the same 
initial point M (and not just the neighboring extremals) intersect the given 
extremal at M. This property stems from the fact that a sphere has con- 
stant curvature, and is no longer true if the sphere is replaced by a “‘neigh- 
boring” ellipsoid (for example). 


We conclude this section by summarizing the necessary conditions for an 
extremum found so far: If the functional 


[ Fey rddx, Wa = A, 0) = B 


has a weak extremum for the curve y = y(x), then 


1. The curve y = y(x) is an extremal, i.e., satisfies Euler’s equation 


(see Sec. 4); 


2 Along the curve y = y(x), Fyy > 0 for a minimum and F,,, < 0 for 
a maximum (see Sec. 25); 


3. The interval (a, 5) contains no points conjugate to a (see Sec. 27). 


28. Sufficient Conditions for a Weak Extremum 


In this section, we formulate a set of conditions which is sufficient for a 
functional of the form 


IDI =f Fes yx)dx = A, 0) = B (44) 


to have a weak extremum for the curve y = y(x). It should be noted that 
the sufficient conditions to be given below closely resemble the necessary 
conditions given at the end of the preceding section. The necessary con- 
ditions were considered separately, since each of them is necessary by itself. 
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However, the sufficient conditions have to be considered as a set, since the 
presence of an extremum is assured only if all the conditions are satisfied 
simultaneously. 


THEOREM. Suppose that for some admissible curve y = y(x), the 
functional (44) satisfies the following conditions: 
1. The curve y = y(x) is an extremal, i.e., satisfies Euler’s equation 
d 
Fy, - qx F, = 0; 


2. Along the curve y = y(x), 
P(x) = 4F,vLx, v(x), y'(xX)] > 0 
(the strengthened Legendre condition); 


3. The interval [a, b] contains no points conjugate to the point a (the 
strengthened Jacobi condition).'° 


Then the functional (44) has a weak minimum for y = y(x). 


Proof. If the interval [a, 5) contains no points conjugate to a, and if 
P(x) > 0 in [a, 6], then because of the continuity of the solution of 
Jacobi’s equation and of the function P(x), we can find a larger interval 
[a, 5 + e] which also contains no points conjugate to a, and such that 
P(x) > Oin [a,b + e«]. Consider the quadratic functional 


[ (eh? + On) dx — a? [ h’? de, (45) 
with the Euler equation 
- 4 [(P — «)h'] + Qh = 0. (46) 


Since P(x) is positive in [a, 5 + ce] and hence has a positive (greatest) 
lower bound on this interval, and since the solution of (46) satisfying the 
initial conditions A(a) = 0, A’(0) = 1 depends continuously on the 
parameter « for all sufficiently small «, we have 


1 P(x) -e > Oa<cx<b; 


2. The solution of (46) satisfying the boundary conditions A(a) = 0, 
h'(a) = | does not vanish fora < x < b. 


As shown in Theorem | of Sec. 26, these two conditions imply that the 
quadratic functional (45) is positive definite for all sufficiently small «. 
In other words, there exists a positive number c > 0 such that 


| ” (Ph’? + Qh?) dx > c f ” hi de: (47) 


18 The ordinary Jacobi condition states that the open interval (a, 5) contains no points 


conjugate to a. Cf. Jacobi’s necessary condition, p. 112. 
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It is now an easy consequence of (47) that a minimum is actually 
achieved for the given extremal. In fact, if y = y(x) is the extremal 
and y = »(x) + A(x) is a sufficiently close neighboring curve, then, 
according to formula (12) of Sec. 25, 


Jly + h] -— Jb] = i) ” (PH? + Qh?) dx + [ (Eh? + nh'?) dx, (48) 


where €(x), 4(x) > 0 uniformly for a < x < 6 as ||Al|,-> 0. Moreover, 
using the Schwarz inequality, we have 


r 2 iz. b 
h(x) = (J W dx) < (x — a) fh? dx < (x — a) [WP ds, 
i.€., 


> (6 — a)? 7° 
2 42 
i Wedx 25" | nds, 
which implies that 


| i ” (Eh? + nh?) dx 


<e (1 + eo") i h’? dx (49) 


if |€(x)| < ¢, [n(x)| < « Since « > 0 can be chosen to be arbitrarily 
small, it follows from (47) and (49) that 


Jly + Al —Jpl = it (Ph’? + Qh?) dx + if (2h? + yh’) dx > 0 


for all sufficiently small ||All,. Therefore, the extremal y = y(x) 
actually corresponds to a weak minimum of the functional (44), in some 
sufficiently small neighborhood of y = y(x). This proves the theorem, 
thereby establishing sufficient conditions for a weak extremum in the 
case of the “simplest” variational problem. 


29. Generalization to n Unknown Functions 


The concept of a conjugate point and the related Jacobi conditions can 
be generalized to the case where the functional under consideration depends 
on n functions y,(x),..., ¥,(x). In this section we carry over to such 
functionals the definitions and results given earlier for functionals depend- 
ing on a single function. To keep the notation simple, we write 


Jol = [Fos yy) ax (50) 


as before, where now y denotes the n-dimensional vector (y,,..., y,) and y’ 
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the n-dimensional vector (y},..., ¥n) [cf. Sec. 20]. By the scalar product 
(y, 2) of two vectors 


y=(1-- +> Vado ZZ sev Za) 
we mean, as usual, the quantity 


, z) = y121 totes + VnZp. 


Whenever the transition from the case of a single function to the case of n 
functions is straightforward, we shall omit details. 


29.1. The second variation. The Legendre condition. If the increment 


AJ[h] of the functional (50), corresponding to the change from y to y + h,1® 
can be written in the form 


AJA] = gih] + palh] + &llAl]?, 


where ¢,[A] is a linear functional, p.[h] is a quadratic functional, and «> 0 
as ||h|| 0, then 9,[A] is called the second variation of the original functional 
(50) and is denoted by 82J[A].’”__ In the case of fixed end points, where 


h(a) = h(b) = 0 G= 1,...,7), 
or more concisely, 


h(a) = A(b) = 0, 


we easily find, applying Taylor’s formula, that the second variation of (50) 
is given by 


1 1 1 { 


Fst] dx. (51) 


1 
Introducing the matrices 
Fy = Fux | Fy = lFuviell, Fyy = Fav (52) 


we can write (51) in the compact form 


8th) = 5 [Five A) + Fyyh, M) + (Fry dx, (53) 


b 
a 
where each term in the integrand is the scalar product of the vector / or fh’ 


and the vector obtained by applying one of the matrices (52) to A or h’. 
Then, integrating by parts, we can reduce (53) to the form 


b 
[lpr #) + (Qh, Max, (54) 
16 The letter A denotes the vector (fi,...., An), and ||A|| means 


>, max (AG)! + 1h GO1} = 2 Wl. 


17 Obviously, p,[A] is the (first) variation of the functional (50). 
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where P = P(x) and Q = Q(x) are the matrices 
1 1 d 
P= [Pull = 2 Fyy Q= || Quel ae 3 (Fw ar 5 Fu: 


In deriving (54), we assume that F,, is a symmetric matrix,!® i.e., that 
Fyy, = Fy y. for alli, kK =1,...,” (F,, and F,, are automatically sym- 
metric, because of the tacitly assumed smoothness of F’). Just as in the case 
of one unknown function, it is easily verified that the term (Ph’,h’) makes 
the ‘‘main contribution” to the quadratic functional (54). More precisely, 
we have the following result: 


THEOREM |. A necessary condition for the quadratic functional (54) to 
be nonnegative for all h(x) such that h(a) = h(b) = 0 is that the matrix P 
be nonnegative definite.” 


29.2. Investigation of the quadratic functional (54), As in Sec. 26, we can 
investigate the functional (54) without reference to the original functional 
(50), assuming, however, that P and Q are symmetric matrices. As before 
(see Sec. 26), we begin by writing the system of Euler equations 


d n ? n 
— FD, Puli + D Oud = 0 (A =1,...,7), (55) 
i=i i=1 


corresponding to the functional (54). The equations (55) can be written 


more concisely as a 
— a (PA) + Oh = 0, (56) 


in terms of the matrices P and Q. 


DEFINITION 1. Let 
ho = (Ai, hyo, sey hyn); 
YY = (hex, hoo, sey han), (57) 


h™ = (lay, Anas.» + 5 Ann) 


be a set of n solutions of the system (55), where the ith solution satisfies 


the initial conditions”® 
h,,(a) = 0 (A =1,...,n) (58) 


h(a) = 1, hi,{a) = 0 (k # i). (59) 


*® Without this assumption, which is unnecessarily restrictive, equations (54) and (55) 
become more complicated, but it can be shown that Theorems | and 2 remain valid 
(H. Niemeyer, private communication). 


and 


* This is the appropriate multidimensional generalization of the Legendre condition 
(14), p. 103. The matrix P = P(x) is said to be nonnegative definite (positive definite) 
if the quadratic form n 

> Pulxh(xhx)  (a<x <b) 
i,k=1 
is nonnegative (positive) for all x in [a, 5) and arbitrary h,(x), ..., A(x). 

2 Thus, the vectors 4(a@) are the rows of the zero matrix of order n, and the vectors 

h(a) are the rows of the unit matrix of order n. 
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Then the point ad (#a) is said to be conjugate to the point a if the deter- 
minant 


hyy(x) Aya) «++ Ain) 


hoy (x) hoo(x) + +> Aon 
(x) Aza(x) (x) (60) 


har (X) Ay2(x) ae Agn(X) 
vanishes for x = G. 


THEOREM 2. If Pisa positive definite symmetric matrix, and if the inter- 
val [a, b] contains no points conjugate to a, then the quadratic functional 
(54) is positive definite for all h(x) such that h(a) = h(b) = 0. 

Proof. The proof of this theorem follows the same plan as the proof 
of Theorem | of Sec. 26. Let W be an arbitrary differentiable sym- 
metric matrix. Then 


bod Bag b , 
i= [ Ebi dx = [Wh hy dx +2] (Wh, h’) dx 


for every vector A satisfying the boundary conditions (58). Therefore, 
we can add the expression 


(W'h, h) + 2(WAh, h') 
to the integrand of (54), obtaining 


i [(Ph’, h’) + 2(WA, h') + (Qh, h) + (W'h, h)\ dx, (61) 


without changing the value of (54). 

We now try to select a matrix W such that the integrand of (61) is a 
perfect square. This will be the case if Wis chosen to be a solution of 
the equation”? 


Q+ W' = WP-W, (62) 


which we call the matrix Riccati equation (cf. p. 108). In fact, if we 
use (62), the integrand of (61) becomes 


(Ph, h’) + 2(Wh, h’) + (WP- WA, h). (63) 


Since P is a positive definite symmetric matrix, the square root P!/? 
exists, is itself positive definite and symmetric, and has the inverse 
P 12. Therefore, we can write (63) as the “perfect square” 


(P2h' + P-¥2Wh, PY2h' + P-¥2Wh). 


[Recall that if T is a symmetric matrix, (Ty, z) = (y, Tz) for any 
vectors y and z.]_ Repeating the argument given in the case ofa scalar 
function / (see p. 107), we can show that 
PU2h! + P-V2Wh 
"It can be shown that this is compatible with W being symmetric, even when Fyy- 
fails to be symmetric and (62) is replaced by a more general equation (H. Niemeyer, 
private communication). 
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cannot vanish for all x in [a, 6] unless A = 0. It follows that if the 
matrix Riccati equation (62) has a solution W defined on the whole 
interval [a, 5], then, with this choice of W, the functional (61), and hence 
the functional (54), is positive definite. 

Thus, the proof of the theorem reduces to showing that the absence 
of points in [a, 6] which are conjugate to a guarantees that (62) has a 
solution defined on the whole interval [a, 6]. Making the substitution 


W = —PU'U"} (64) 


in (62), where U is a new unknown matrix [cf. (32)], we obtain the 
equation 


~ 4 pu + Qu =0, (65) 


which is just the matrix form of equation (56). The solution of (65) 
satisfying the initial conditions 


u0)=98 U@O=TL 


where @ is the zero matrix and / the unit matrix of order n, is precisely 
the set of solutions (57) of the system (55) which satisfy the initial 
conditions (58) and (59) [cf. footnote 19, p. 119]. If [a, b] contains 
no points conjugate to a, we can show that (65) has a solution U(x) 
whose determinant does not vanish anywhere in [a, b),?? and then 
there exists a solution of (62), given by (64), which is defined on the 
whole interval [a, 6]. In other words, we can actually find a matrix W 
which converts the integrand of the functional (61) into a perfect square, 
in the way described. This completes the proof of the theorem. 


Next we show, as in Sec. 26, that the absence of points conjugate to a 
in the interval [a, 5] is not only sufficient but also necessary for the functional 
(53) to be positive definite. 


Lemma. If 
A(x) = (AQ), . - -, An(x)) 


satisfies the system (55) and the boundary conditions 


h(a) = h(b) = 0, (66) 
then 


[ Pe, ny + (Qh, WN) dx = 0. 


2 The fact that det P does not vanish in [a, 5] is tacitly assumed, but this is guaranteed 
by the positive definiteness of P (cf. footnote 9, p 108). 
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Proof. The lemma is an immediate consequence of the formula 
b d ; _ 7 sev 
0= i} (- qx (PR) + Qh, n) dx = io [(Ph’, h’) + (Qh, h)] dx, 


which is obtained by integrating by parts and using (66). 


THEOREM 3. If the quadratic functional 
b 
[ (a, Hy + (Oh, hy) ax, (67) 


where P is a positive definite symmetric matrix, is positive definite for 
all h(x) such that h(a) = h(b) = 0, then the interval {a, b] contains no 
points conjugate to a. 


Proof. The proof of this theorem follows the same plan as the proof 
of the corresponding theorem for the case of one unknown function 
(Theorem 2 of Sec. 26). We consider the positive definite quadratic 
functional 


[, ler, ) + Oh, + = DUA dx. (68) 


The system of Euler equations corresponding to (68) is 
d n ; F n 
Poe [+> Padi +(1- hi + i> On = 90 (K=1,...,n) (69) 
i=1 1=1 


(cf. (37)], which for t = 1 reduces to the system (55), and for t = 0 
reduces to the system 


Ae=0 (k=1,...,2). 


Suppose the interval [a, b] contains a point & conjugate to a, i.e., suppose 
the determinant (60) vanishes for x = @. Then there exists a linear 
combination A(x) of the solutions (57) which is not identically zero such 
that A(@) = 0. Moreover, there exists a nontrivial solution A(x, t) of 
the system (69) which depends continuously on ¢ and reduces to A(x) 
fort = 1. Itisclear that d # 5, since otherwise, according to the lemma, 
the positive definite functional (67) would vanish for A(x) # 0, which 
is impossible. The fact that @ cannot be an interior point of [a, 5] is 
proved by the same kind of argument as used in Theorem 2 of Sec. 26, 
for the case of a scalar function A(x). Further details are left to the 
reader. 


Suppose now that we only require that the functional (67) be nonnegative. 
Then, by the same argument as used to prove Theorem 2’ of Sec. 26, we have 


THEOREM 3’. If the quadratic functional 


[ cen, H) + (Oh, AN dx, 
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where P is a positive definite symmetric matrix, is nonnegative for all h(x) 
such that h(a) = h(b) = 0, then the interval [a, b] contains no interior 
points conjugate to a. 


Finally, combining Theorems 2 and 3, we obtain 


THEOREM 4. The quadratic functional 
b 
[ (PA, A) + (Oh, AN) dx, 


where P is a positive definite symmetric matrix, is positive definite for all 
h(x) such that h(a) = h(b) = 0 if and only if the interval {a, b] contains 
no point conjugate to a. 


29.3. Jacobi’s necessary condition. More on conjugate points. We now 
apply the results just obtained to the original functional 


Jil =f Fey») dx (a) = Mo, (b)= My — (10) 


where M, and M, are two fixed points, recalling that the second variation of 
(70) is given by 


[ ter, a) + (Oh, ml dx, (71) 
where 
1 d 
P=3Fy, = 3(Fw - 7 Fw) (72) 


DEFINITION 2. The system of Fuler equations 


d n ; n 
— FD Pudi + D Quy = 0 (k =1,...,n), 
i=i i=1 


or more concisely 


e < (Ph’) + Qh = 0, (73) 


of the quadratic functional (71) is called the Jacobi system of the original 
functional (70). 23 


DEFINITION 3. The point @ is said to be conjugate to the point a with 
respect to the functional (70) if it is conjugate to a with respect to the 
quadratic functional (71) which is the second variation of the functional 
(70), i.e., if it is conjugate to ain the sense of Definition 1, p. 119. 


Since nonnegativity of the second variation is a necessary condition for 
the functional (70) to have a minimum (see Theorem | of Sec. 24), Theorem 3’ 
immediately implies 


2 Equations (70)-(73) closely resemble equations (39)-(42) of Sec. 27, except that 
h, hk’ are now vectors, and P, Q are now matrices. 
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THEOREM 5 (Jacobi’s necessary condition). If the extremal 
Yi = Vil), «+ +s Ya = Val) 
corresponds to a minimum of the functional (70), and if the matrix 
Fyy[x, XX), ¥'()] 


is positive definite along this extremal, then the open interval (a, b) contains 
no points conjugate to a. 


So far, we have said that the point @ is conjugate to a if the determinant 
formed from n linearly independent solutions of the Jacobi system, satisfying 
certain initial conditions, vanishes for x = 4@. As in the case n = 1, this 
basic definition is equivalent to two others, which involve only extremals of 
the functional (70), and not solutions of the Jacobi system: 


DEFINITION 4. Suppose n neighboring extremals 
Vi = Val), ---5 Va = Vinal) (i= 1,..., 4”) 


start from the same n-dimensional point, with directions which are close 
together but linearly independent. Then the point @ is said to be conjugate 
to the point a if the value of the determinant 


Yis(x)  Yro(x) Yinlx) 
Ya(x)  Yo2(x) Yan(x) 
Vni(x) V nox) Yan(X) 


for x = Gis an infinitesimal whose order is higher than that of its values 
fora<x <4. 


In the next definition, we enlarge the meaning of a conjugate point to 
apply to points lying on extremals (cf. footnote 14, p. 114). 
DEFINITION 5. Given an extremal y with equations 
Ji = Ji(x), Seg Vr = yrlx), 
the point 
M ae (4, yi(@), ees, yn(@)) 
is said to be conjugate to the point 
M = (a, yi(a), vy y,(a)) 


if y has a sequence of neighboring extremals drawn from the same initial 
point M, such that each neighboring extremal intersects y and the points 
of intersection have M as their limit. 


The equivalence of all these definitions of a conjugate point is proved by 
using considerations similar to those given for the case of a single unknown 
function (see Sec. 27). 
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29.4. Sufficient conditions for a weak extremum. Theorem 2 and an 
argument like that used to prove the corresponding theorem of Sec. 28 
(for the scalar case) imply 


THEOREM 6. Suppose that for some admissible curve y with equations 
y= yi(x), son = yn(x); 
the functional (70) satisfies the following conditions: 


1. The curve y is an extremal, i.e., satisfies the system of Euler equations 
—F,=0 G=1,...,n”); 


2. Along y the matrix 
P(x) = $F yy Lx, v(x), YI 
is positive definite; 
3. The interval [a, b] contains no points conjugate to the point a. 


Then the functional (70) has a weak minimum for the curve y. 


30. Connection between Jacobi’s Condition and the Theory of 
Quadratic Forms™ 


According to Theorem 3 of Sec. 26, the quadratic functional 


[ene + on) dx, (74) 


where 
P(x) > 0 (a<x<)b), 


is positive definite for all A(x) such that A(a) = A(b) = 0 if and only if the 
interval [a,b] contains no points conjugate to a.2° The functional (74) 
is the infinite-dimensional analog of a quadratic form. Therefore, to obtain 
conditions for (74) to be positive definite, it is natural to start from the 
conditions for a quadratic form defined on an n-dimensional space to be 
positive definite, and then take the limit as 2 > o. 

This may be done as follows: By introducing the points 


A= Xo, X11, ++ +> Xny Xng1 = b, 
we divide the interval [a, b] into n + 1 equal parts of length 
_b-a 
n+] 


Ax — N41 — x, 


** Like Sec. 29, this section is written in a somewhat more concise style thanthe rest of 
the book, and can be omitted without loss of continuity. 
28 This is the strengthened Jacobi condition (see p. 116). 
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Then we consider the quadratic form 


& [CHa A) + ond] ax (75) 


where P,, Q; and /, are the values of the functions P(x), Q(x) and A(x) at 
the point x = x, This quadratic form is a “finite-dimensional approxi- 
mation” to the functional (74). Grouping similar terms and bearing in 
mind that 

ho = h(a) = 0, hazy = W(b) = 0, 


we can write (75) as 


> [( aude + RF Eng - 278 ya]: (76) 
{=1 


In other words, the quadratic functional (74) can be approximated by a 
quadratic form in n variables /,,..., 4,, with the n x n matrix 


a, b, 0 0 0 0 
by Qo by as 0 0 0 
0 b a 0 0 0 
moh ; (77) 
0 0 b,-2 Qn-1 ba-1 
0 by-1 a, | 
where 
a = QuAx + Ath Gs. gn) (78) 
and 
2) ie _ 
ata eS G=1,...,.2—- 1). (79) 


A symmetric matrix like (77), all of whose elements vanish except those 
appearing on the principal diagonal and on the two adjoining diagonals, 
is called a Jacobi matrix, and a quadratic form with such a matrix is called 
a Jacobi form. For any Jacobi matrix, there is a recurrence relation between 
the descending principal minors, i.e., between the determinants 


a, b, 0 0 oOo oO 
b, az by 0 0 0 
0 ob as 0 0 0 
dD, = ; (80) 
0 0 0 Bie Gar By 
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where / = 1,...,m. In fact, expanding D, with respect to the elements of 
the last row, we obtain the recursion relation 


D, = aD,-, — b?-,D,-2, (81) 


which allows us to determine the minors D3,..., D, in terms of the first two 
minors D, and Dz. Moreover, if we set Dp = 1, D_, = 0, then (81) is 
valid for all / = 1,..., 2, and uniquely determines D,,..., D,. 

According to a familiar result, sometimes called the Sylvester criterion, 
a quadratic form 


S One EK (Gis = Ax) 


1 1 


is positive definite if and only if the descending principal minors 


411 A2 Arg 
Qi, 12 


Q11, » |@o, @go Qgg|, ..., det Il air 


Qo, Goo 
Q3, Q32 43a 


of the matrix ||q;,|| are all positive.2® Applied to the present problem, 
this criterion states that the Jacobi form (76), with matrix (77), is positive 
definite if and only if all the quantities defined by (81) are positive, where 
i=1,...,n and Dp = 1, D_, =0. 

We now use this result to obtain a criterion for the quadratic functional 
(74) to be positive definite. Thus, we examine what happens to the recur- 
rence relation (81) as n> oo. Substituting for the coefficients a; and b, 
from (78) and (79), we can write (81) in the form 


Pi-ult+P, 
Ax 


2 
)Di-a — Gos Dina (i= 1,...,n). (82) 


D, = (0 Ax + (Ax)? 


It is obviously impossible to pass directly to the limit n — oo (i.e., Ax > 0) 
in (82), since then the coefficients of D,;_, and D,.,. become infinite. To 
avoid this difficulty, we make the ‘‘change of variables”’ 27 


D= ay (Gi =1,...,”), 

Z 
Dy =, = 1; (83) 
D_»~=Z,=9. 


7 See e.g., G. E. Shilov, op. cit., Theorem 27, p. 131. 
* Substituting the expressions (78) and (79) into (80), we find by direct calculation 
that D, is of order (Ax)~‘, and hence that Z, is of order Ax. 
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In terms of the variables Z,, the recurrence relation (82) becomes 


Pye? PLiay Pig SP) Ae 
(Qntt = (2.4% + Rx (Ox! 
- PP? Py +++ Pi-2Zy-1 
(Ax)? (Ax)? 


ie., 
Q.Z(Ax)? + P,-1Z, + PZ, — PiZi41 — Py-1Z;-1 = 0 
or 


1 (, Zin — Z DAD a. es 
QZ, ~ me (Ra = Pry | =0 (i=1,...,7). (84) 


Passing to the limit Ax — 0 in (84), we obtain the differential equation 
d ( _ 
- a PZ) + OZ =0, (85) 


which is just the Jacobi equation! 

The condition that the quantities D, satisfying the relation (82) be positive 
is equivalent to the condition that the quantities Z, satisfying the difference 
equation (84) be positive, since the factor 


is always positive [because of the condition P(x) > 0]. Thus, we have proved 
that the quadratic form (76) is positive definite if and only if all but the first 
of the n+ 2 quantities Zo, Z,,...,Zn+1 Satisfying the difference equation 
(84) are positive.?8 

If we consider the polygonal line IT, with vertices 


(a, Zo); (*1, Z), a eg (6, Zn+) 
recall that a = Xo, b = x, ), the condition that Z, = 0 and Z, > O for 
i= 1,...,2+ 1 means that ITI, does not intersect the interval [a, b] except 


at the end point a. As Ax->0, the difference equation (84) goes into 

the Jacobi differential equation (85), and the polygonal line ITI, goes into a 

nontrivial solution of (85) which satisfies the initial condition 
Wh=720,. -Ziastme aaa 


im ~~ = 
azo Ax Azo Ax 


1 


and does not vanish fora < x < 6. In other words, as n—> o, thie Jacobi 
form (76) goes into the quadratic functional (74), and the condition that (76) 


* Note that Z,. = 0, Z: = Ax > 0, according to (83). Note also that these two 
equations, together with the nm equations (84), form a system of m + 2 independent 
linear equations in m + 2 unknowns, and that such a system always has a unique 
solution. 
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be positive definite goes into precisely the condition for (74) to be positive 
definite given in Theorem 3 of Sec. 26, i.e., the condition that [a, 6] contain 
No points conjugate to a. The legitimacy of this passage to the limit can 
be made completely rigorous, but we omit the details. 


PROBLEMS 


1, Calculate the second variation of each of the following functionals: 
a) J[y] = [ F(x, y) dx; 
b) J[y] = it F(x, ¥, Ws YO) dx; 
c) J[uj = ff F(x, y, U, Uz, Uy) dx dy. 
2. Show that the second variation of a linear functional is zero. State and 


prove a converse result. 


3. Prove that a quadratic functional is twice differentiable, and find its first 
and second variations. 


4, Calculate the second variation of the functional 
elt yl, 
where J[y] is a twice differentiable functional. 
Ans. 8%e¥) = [(87)? + 82 Jer), 


5. Give an example showing that in Theorem 2 of Sec. 24, we cannot replace 
the condition that 82J [A] be strongly positive by the condition that 87/ [A] > 0. 


6. Derive the analog of Legendre’s necessary condition for functionals of the 
form 


Jt) = | [FOG », 4, ues us) dx dy, 


where u vanishes on the boundary of R. 
Ans. The matrix 
| Fuyur Furuy 


VB F, 


UyUy | 


| 


should be nonnegative definite (cf. p. 119). 
7. For which values of a and 6 is the quadratic functional 
[709 - bP eI dx 


nonnegative for all f(x) such that f(0) = f(a) = 0? Deduce an inequality 
from the answer. 


8. Show that the extremals of any functional of the form 


f F(x, y’) dx 


have no conjugate points. 
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9. Prove that if a family of extremals drawn from a given point A has an 
envelope £, then the point where a given extremal touches E is a conjugate 
point of A. 


10. Investigate the extremals of the functional 


Ji =| 2a4x, yO = 1,1) = 4, 


where 0 < a, 0 < A < 1. Show that two extremals go through every pair 
of points (0, 1) and (a, A). Which of these two extremals corresponds to 
a weak minimum? 


Hint. The line x = Ois an envelope of the family of extremals. 


11. Prove that the extremal y = y.x/x, corresponds to a weak minimum of 
both functionals 


where y(0) = 0, y(x1) = yi, X1 > 0, ¥1 > 0. 


12. What is the restriction on a if the functional 
[2 - dz, 0) = 0 ya) = 0 


is to satisfy the strengthened Jacobi condition? Use two approaches, one 
based on Jacobi’s equation (42) and the other based on Definition 4 (p. 114) 
of a conjugate point. 


13. Is the strengthened Jacobi condition satisfied by the functional 
Ipl= [O24 + Adz, 0) =0, way =0 
for arbitrary a? 


Ans. Yes. 


14. Let y = y(x, a, 8) be a general solution of Euler’s equation, depending on 
two parameters « and 8. Prove that if the ratio 


dy/da 
by/ OB 


is the same at two points, the points are conjugate. 


15. Consider the catenary 


x+b 
y = ccosh ( ): 
c 
where 6 and ¢ are constants. Show that any point on the catenary except 
the vertex (— 6, c) has one and only one conjugate, and show that the tangents 
to any pair of conjugate points intersect on the x-axis. 


6 


FIELDS. 
SUFFICIENT CONDITIONS 
FOR A STRONG EXTREMUM 


In our study of sufficient conditions for a weak extremum, we introduced 
the important concept of a conjugate point. The simplest and most natural 
way to introduce this concept is based on the use of families of neighboring 
extremals (see Sec. 27). Then the conjugate of a point M lying onan extremal 
y is defined as the limit of the points of intersection of y with the neighboring 
extremals drawn from M. 

The utility of studying families of extremals rather than individual extremals 
is particularly apparent when we turn our attention to the problem of finding 
sufficient conditions for a strong extremum. The study of such families of 
extremals is intimately connected with the important concept of a field, 
which we introduce in the next section. Since the concept of a field is 
useful in many problems, we first give a general definition of a field, which is 
not directly related to variational problems. 


31. Consistent Boundary Conditions. General Definition 
of a Field 


Consider a system of second-order differential equations 


Vie =F Vr 6s Var Viv + 0 Yn) G@=1,...,”), (1) 


solved explicitly for the second derivatives. In order to single out a definite 
131 
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solution of this system, we have to specify 2n conditions, e.g., boundary 
conditions of the form 


Vi = Vir, - ~~ Yn) Gi = I,...,2) (2) 


for two values of x, say x, and x2. Boundary conditions of this kind are 
commonly encountered in variational problems. If we require that the 
boundary conditions (2) hold only at one point, they determine a solution 
of the system (1) which depends on n parameters. 

We now introduce the following definitions: 


DEFINITION 1. The boundary conditions 


yj = VPC, Va) (i = le. 5M), (3) 
prescribed for x = x,, and the boundary conditions 
Vi = VP, - +) Yn) G= I,..., 7), (4) 


prescribed for x = X2, are said to be (mutually) consistent if every solution 
of the system (1) satisfying the boundary conditions (3) at x = x, also 
satisfies the boundary conditions (4) at x = x2, and conversely.) 


DEFINITION 2. Suppose the boundary conditions 


VW=HEWGIy-- In) (=I, 0) (5) 


(where the |; are continuously differentiable functions) are prescribed for 
every x in the interval [a,b], and suppose they are consistent for every 
pair of points x,, X2 in [a,b]. Then the family of mutually consistent 
boundary conditions (5) is called a field (of directions) for the given 
system (1). 


As is clear from (5), boundary conditions prescribed for every value of x 
define a system of first-order differential equations. The requirement that 
the boundary conditions be consistent for different values of x means that 
the solutions of the system (5) must also satisfy the system (1), i-e., that (1) 
is implied by (5). 

Because of the existence and uniqueness theorem for systems of differential 
equations,” one and only one integral curve of the system (5) passes through 


1 Thus, one might say that the boundary conditions at x; can be replaced by the bound- 
ary conditions at x2 which are consistent with those at x; In a boundary value 
problem, the boundary conditions represent the influence of the external medium. 
But in every concrete problem, we are at liberty to decide what is taken to be the external 
medium and what is taken to be the system under consideration. For example, in 
studying a vibrating string, subject to certain boundary conditions at its end points, 
we can focus our attention on a part of the string, instead of the whole string, regarding 
the rest of the string as part of the external medium and replacing the effect of the 
‘‘discarded”’ part of the string by suitable boundary conditions at the end points of the 
‘“‘retained”’ part of the string. 

2 See e.g., E. A. Coddington, op. cit., Chap. 6. 


SEC. 31 SUFFICIENT CONDITIONS FOR A STRONG EXTREMUM 133 


each point (x, y,,..., ¥,) of the region R where the functions (x, ¥1,.--, Ya) 
are defined. According to what has just been said, each of these curves is 
at the same time a solution of the system (1). Thus, specifying a field (5) 
of the system (1) in some region R defines an n-parameter family of solutions 
of (1), such that one and only one curve from the family passes through each 
point of R. Thecurves of the family will be called trajectories of the field.° 

The following theorem gives conditions which must be satisfied by the 
functions (x, y1,...,¥n), 1 <is<n, if the system (5) is to be a field 
for the system (1): 


THEOREM. The first-order system 
VY, = Vi, i, +s Yn) (a<x<b;l<i<¢n) (6) 

is a field for the second-order system 
Vi =A Vrs +s Yar Vas 1 Yn) (7) 
if and only if the functions Y(x, y1,..-; Yn) Satisfy the following system 


of partial differential equations, called the Hamilton-Jacobi system* for 
the original system (7): 


Hw Tm, _ 
et ao Vic = Fil%, Vrs es Yas Vas oes Yn). (8) 


Thus, every solution of the Hamilton-Jacobi system (8) gives a field for 
the original system (7) 


Proof. Differentiating (6) with respect to x, we obtain 
i.e., 


Thus, the system (7) is a consequence of the system (6) if and only if 
(8) holds. 


Example 1. Consider a single linear differential equation 
y” = Plx)y- (9) 


3 A field is usually defined not as a family of boundary conditions which are compatible 
at every two points, but as a set of integral curves of the system (1) which satisfy the 
conditions (5) at every point, i.e., as a general solution of the system (5). However, 
it seems to us that our definition has certain advantages, in particular, when applying 
the concept of a field to variational problems involving multiple integrals. 

* For an explanation of the connection between the system (8) and the Hamilton- 
Jacobi equation defined in Chapter 4, see the remark on p. 143. 
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The corresponding Hamilton-Jacobi system reduces to a single equation 


a + x b = p(x)y, 
i.e., 
B+ 5S = ploy. (10) 


The set of solutions of (10) depends on an arbitrary function, and according 
to the theorem, each of these solutions is a field for equation (9). 
The simplest solutions of (10) are those that are linear in y: 
p(x, Y) = a(x) y. (11) 
Substituting (11) into (10), we obtain 
a'(x)y + a%(x)y = p(x). 

Thus, (x) satisfies the Riccati equation 

a’(x) + a(x) = p(x). (12) 
Solving (12) and setting 

y’ = afx)y, 

we obtain a field (which is linear in y) for the differential equation (9). 


Example 2. In the same way, we can find the simplest field for a system 
of linear differential equations 
Y" = P(x)Y, (13) 


where Y = (yy,..-,)n) and P(x) = ||py(x)|| is a matrix. The system of 
Hamilton-Jacobi equations corresponding to (13) is 


OW > Oh : 
+ = x i=1,...,7). 14 
Gx + 2s By, YE = 2, Pure — ( ) (14) 
Let us look for a solution of (14) which is linear in Y, i.e., 
pix, Viseees Yn) = > Osie(X) Vics (15) 
k=1 
or in vector notation, 
Y= AY. 


Substituting (15) into (14), we obtain 


p2 aie X) Ve + 2 O43¢(X) 2 He i(X) Py = pF Pir(X) Vics 


or in matrix form 


[= A(x)| Y¥ + A%Xx)Y = P(x), 
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where A = |\a,,||. Thus, if the matrix A(x) satisfies the equation 
d 2 
Ala) + A%X) = PCO), 


which it is natural to call a matrix Riccati equation (cf. p. 120), the functions 
(15) define a field for the system (13), and this field is linear in y. 

It is worth noting, although this observation will not be needed later, 
that the concept of a field is intimately related to the solution of boundary 
value problems for systems of second-order differential equations by the 
so-called “‘sweep method.” We illustrate this method by considering the 
very simple case where the system consists of a single linear differential 
equation 


y"(x) = pPO»)y(x) + FQ), (16) 
with the boundary conditions 
y'(a) = coy(a) + do, (17) 
y'(b) = cry(b) + ah. (18) 
We begin by constructing the first-order differential equation 
¥'(X) = a(x) p(x) + B(x) (19) 


and requiring that all its solutions satisfy the boundary condition (17) and 
the original equation (16). Obviously, to meet the first requirement, we 
must set 


aa) = Co, Bla) = do. (20) 
To meet the second requirement, we differentiate (19), obtaining 
Y"(X) = a(x) VX) + ax) y"X) + B'O). 
Substituting (19) for y’(x) in the right-hand side, we find that 
y"(x) = [a'(x) + 07x] 0) + BC) + a(x)B(X), 
from which it is clear that (19) implies (16) if 
a(x) + 2%(x) = pla, in 
B'(x) + a(x)B(x) = f(x). 


Now let a(x) and B(x) be a solution of the system (21), satisfying the 
initial conditions (20). Once we have found a(x) and B(x), we can write a 
“boundary condition” 


Y'(%o) = %(X0) ¥(%0) + B(%o) 


for every point x, in [a, 5]. This process of shifting the boundary condition 
originally prescribed for x = a over to every other point in the interval 
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[a, b] is called the ‘forward sweep.” In particular, setting x = 5, we obtain 
the equation 


y'(6) = (6) y(b) + B(), 


which, together with the boundary condition (18), forms a system determining 
y(6) and y'(6). If these values are uniquely determined, our original boundary 
value problem has a unique solution, i.e., the solution of equation (19) which for 
x = b takes the value y(b) just found. This second stage in the solution of 
the boundary value problem is called the “backward sweep.’”’ These consider- 
ations apply to the case of a single equation, but a similar method can be 
used to deal with systems of second-order differential equations. 

The use of the sweep method to solve the boundary value problem con- 
sisting of the differential equation (16) and the boundary conditions (17) 
and (18) has decided advantages over the more traditional method. [In the 
latter method, we first find a general solution of equation (16) and then choose 
the values of the arbitrary constants appearing in this solution in such a way 
that the boundary conditions (17) and (18) are satisfied.] These advantages 
are particularly marked in cases where one must resort to some kind of 
approximate numerical method in order to solve the problem.® 

The connection between the sweep method and the concept (introduced 
earlier) of the field of a system of second-order differential equations is now 
entirely clear. In fact, in the simple case just considered, the forward sweep 
is nothing but the construction of a field linear in y for equation (16). More- 
over, (21) is just the system of ordinary differential equations to which the 
Hamilton-Jacobi system reduces in the case where we are looking for a field 
linear in y of a single second-order differential equation.® 

We might have constructed a field starting from the right-hand end point 
of the interval [a, 5}, rather than from the left-hand end point. Thus, our 
boundary value problem actually involves two fields for equation (16), 
one of which is determined by shifting the boundary condition (17) from a 
to 5, and the other by shifting the boundary condition (18) from btoa. The 
solution of the boundary value problem consisting of the differential equation 
(16) and the boundary conditions (17) and (18) is a curve which is a common 
trajectory of these two fields. Thus, in the sweep method, we construct 
one field (the forward sweep) and then choose one of its trajectories which is 
simultaneously a trajectory of a second field (the backward sweep). 


5]. S. Berezin and N. P. Zhidkov, Metogs: Boiuncnennia, Tom II (Computational 
Methods, Vol. 11), Gos. Izd. Fiz.-Mat. Lit., Moscow (1959), Chap. 9, Sec. 9. 

® In Example 1, we considered the even simpler homogeneous differential equation 
y” = p(x)y, and correspondingly, we looked for a field of the homogeneous form 
y’ = a(x)y. This led to the Riccati equation (12) for the function «(x), identical with 
the first of the equations (21). 
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32. The Field of a Functional 


32.1. Wenow apply the considerations of the preceding section to variational 
problems. The Euler equations 


d : 
Fy — GF = 9 G = 1,...,n), 
corresponding to the functional 
6 
ip F(x, Vis. ++ Yay Vas +++ Yn) OX, (22) 


form a system of n second-order differential equations. In order to single 
out a definite solution of this system, we have to specify 2” supplementary 
conditions, which are usually given in the form of boundary conditions, i.e., 
relations connecting the values of y; and y; at the end points of the interval 
[a, 6) (there are n such relations at each end point). In many cases, of 
course, the boundary conditions are determined by the very functional under 
consideration. For example, consider the variable end point problem for 
the functional 


b 
i F(X, Vis - + +5 Yur Vis + : SD ax + g(a, ya, - ° 2) + gb, Yi, - z sVa)s 
(23) 


differing from (22) by two functions g‘? and g of the coordinates of the 
end points of the path along which the functional is considered. Calculating 
the variation of the functional (23), we obtain 


r S (F, — £ Fh dx + S Fh; 
va f=] fs i=1 za (24) 


+ > gPh(a) + > giPAi(d). 
1=1 t=1 


Setting (24) equal to zero, and assuming that the curve y; = y(x), 1 <i <n, 
is an extremal, we find that 


n zr=b n n 
> Fuh + > aiPhi@ + > eh) = 0. (25) 
f=1 zr=a f=1 i=1 

Since A,(a) and A,(5) are arbitrary, (25) implies that 

(Fo= Bg 0 “Ca cia) (26) 
and 
(Fy, — 8? )\r=5 = (= 1,..., 7). (27) 


If g? = g® = 0, (25) implies 
Fyilz=a = Fyilr=o = 0, 
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i.e., the natural boundary conditions for a variable end point problem like 
the one considered in Sec. 6 [cf. Chap. 1, formula (29)].” 

Next, we examine in more detail the boundary conditions corresponding 
to one end point, say x = a. For simplicity, we write g instead of g%, 
and adopt the vector notation 

y= (Yi, +5Ya)s Y =O%1---5 Pads 
etc., in arguments of functions (cf. Sec. 29). As usual, we introduce the 
“momenta” (see footnote 15, p. 86) 
PAX, YY) = Ful yy) (= 1L..-, 0m), (28) 


and then write the boundary conditions (26) in the form 


PAX, Ys Y')e=0 = By(X,Y\z=a (= 1,..., 0). (29) 
The relations (28) determine y;(a), ..., y,(a) as functions of y,(a),..., y,(a): 
ya) = 4,0))|2-0. (= 1,...,n). (30) 


Boundary conditions that can be derived in this way merit a special name: 


DEFINITION 1. Given a functional 


t) 
[ F(x, y, y') dx, 
with momenta (28), the boundary conditions (30), prescribed for x = a, 
are said to be self-adjoint if there exists a function g(x, y) such that 
pilx, y» P(y)]l2=0 = By (x, V)|z=0 (i =1,..., n). (31) 
THEOREM |. The boundary conditions (30) are self-adjoint if and only 
if they satisfy the conditions 
api lx, Y, YO)] _ prix, ¥, VO)I 7 
lel ee ee) a 
called the self-adjointness conditions. 


7 It should also be noted that the boundary conditions corresponding to fixed end points 
can be regarded as a limiting case of the boundary conditions (26) and (27), although the 
latter involve the additional functions ge? and g®. For example, in the case of the functional 


o 
[Fea 9) dx — kiya) — AP, 

the boundary condition at the left-hand end point is 

[Fy (x, ¥,”) — 2k(y — A)]lz-a = 0 
or 
Fy (%y,¥) 

2k t=a. 

If we now let k —» 00, we obtain in the limit the boundary condition y(a) = A. Similar 
considerations apply to the case of several functions y,,..., ¥n- 


® The conditions (30) can be thought of as assigning a direction to every point of the 
hyperplane x = a. [Cf. formula (2).] 


y(a) = A+ 
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Proof. Vf the boundary conditions (30) are self-adjoint, then (31) 
holds, and hence 


opilx, y, Y(y)) _ Fa(x, y) _ oplx, ¥, VY) 


OY, OY, Vr ay, 


which is just (32). Conversely, if the boundary conditions (30) are such 
that the functions p,[x, y, (y)] satisfy (32), then, for x = a, the p, are 
the partial derivatives with respect to y, of some function g(y),® so that 
the boundary conditions (30) are self-adjoint in the sense of Definition 1. 


Remark. It is immediately clear that for n = 1, i.e., in the case of varia- 
tional problems involving a single unknown function, any boundary con- 
dition is self-adjoint, and in fact, the self-adjointness conditions (32) disappear 
forn = 1. 


32.2. In the preceding section, we introduced the concept of a field for a 


system of second-order differential equations. We now define the field of 


a functional: 


DEFINITION 2. Given a functional 


[Fes x9) de (33) 
with the system of Euler equations 
F;. ~ 2 Fy =0 Gi =4,...,n), (34) 
we say that the boundary conditions 
w=H7Y)  G=4,....9), (35) 
prescribed for x = x,, and the boundary conditions 
Y= VO) C= 1,...,0), (36) 


prescribed for x = xX, are (mutually) consistent with respect to the 
functional (33) if they are consistent with respect to the system (34), i.e., 
if every extremal satisfying the boundary conditions (35) at x = x, 
also satisfies the boundary conditions (36) at x = x2, and conversely. 


DEFINITION 3. The family of boundary conditions 


¥W=4(4y) GL...) (37) 


°See e.g., D. V. Widder, op. cit., Theorem 11, p. 251, and T. M. Apostol, Advanced 


Calculus, Addison-Wesley Publishing Co., Inc., Reading, Mass. (1957), Theorem 
10-48, p. 296. (We tacitly assume the required regularity of the functions p, and of 


their domain of definition.) 
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prescribed for every x in the interval [a, b), is said to be a field of the 
functional (33) if 


1. The conditions (37) are self-adjoint for every x in [a, b]; 


2. The conditions (37) are consistent for every pair of points X, X2 in 


[a, 5}. 


In other words, by a field of the functional (33) is meant a field for the 
corresponding system of Euler equations (34) which satisfies the self- 
adjointness conditions at every point x. The equations (37) represent a 
system of first-order differential equations. Its general solution (the family 
of trajectories of the field) is an n-parameter family of extremals such that 
one and only one extremal passes through each point (x, y;,..., ¥,) of the 
region where the field is defined.'° 

We now give an effective criterion for a given family of boundary con- 
ditions to be the field of a functional: 


THEOREM 2.1! 4 necessary and sufficient condition for the family of 
boundary conditions (37) to be a field of the functional (33) is that the 
self-adjointness conditions 


épilx, J Wx, y)] — op, [x, y, p(x, y)] (38) 
ay, ay, 
and the consistency conditions 
epilx, y> W(x, y)) oe 0H [x, y Y(x, y)] (39) 
Ox oy, 


be satisfied at every point x in [a, b], where 
PAX, YY’) = Fyi(x, y, ¥'); (40) 


and H is the Hamiltonian corresponding to the functional (33): 
H(x, yy) = —FOG YY) + > PGW (41) 
t=1 


Proof. We have already shown in Theorem | that the conditions (38) 
are necessary and sufficient for the boundary conditions 


yi = vi(x, y) iG = l,. ‘ .,N) (42) 


10In the calculus of variations, by a field (of extremals) of a functional is usually 
meant an n-parameter family of extremals satisfying certain conditions, rather than a 
family of boundary conditions of the type just described. However, as already remarked 
(see footnote 3, p. 133), it seems to us that our somewhat different approach to the con- 
cept of a field has certain advantages. 

1 This theorem is the analog of the theorem of Sec. 31, and the system of partial 
differential equations (39) is the analog of the Hamilton-Jacobi system (see p. 133). 
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to be self-adjoint at every point xin [a,b]. Therefore, it only remains to 
show that if (38) holds at every point x in [a, 6], then the conditions (39) 
are necessary and sufficient for the boundary conditions (42) to be con- 
sistent fora < x < 6. To prove this, we set 


Yi = yalx, y), y = p(x, y) 


in (40) and (41), and substitute the right-hand sides of the resulting 
equations into (39). Performing the indicated differentiations and 
dropping arguments (to keep the notation concise), we obtain 


aby 
Fryiz oes vivic a = F,, + p2 Fy, oy, 
ka =1 


4 
Sy eS 83) 
oy k=1 ve oy 
Using the self-adjointness conditions 
OF, _ OF y; 
ay ayn 
we can write (43) in the form 
Fy = Fae + > Pain Ge + dus (44) 
Since 
OF, . oy 
-= Fy, + Fy, ‘ att, 
OYn Vivre 2 Wyy Oy, 


(44) becomes 


Fy = Fy, + > Fy Ye + 2 Fran Gt + -2 ok ‘) (45) 


Along the trajectories of the field, we have 


so that 


a7 Vg _ CY  OW,, 
dx? ~ ox * Pa 'Y; 


Therefore, (45) reduces to 


=0, (46) 
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where | < i<¢n. This means that the trajectories of the field of 
directions (42) are extremals, i.e., (42) is a field of the functional 


[. Fos rds (47) 


and hence the conditions (39) are sufficient. Since the calculations 
leading from (39) to (46) are reversible, the conditions (39) are also 
necessary, and the theorem is proved. 


THEOREM 3. The expression 


Opdx, y, ') _ Op,(x, yy’) 
CPU IF) _ OP ry Vs F 48 
Oye ay, (48) 
has a constant value along each extremal. 


Proof. Using (46), we find that 


d (2: we) - a) 0 


dx\dy, ey) Oy, ay 


CoROLLARY. Suppose the boundary conditions 
N= y) @<x<bl<i<n (49) 


are consistent, i.e., suppose the solutions of the system (49) are extremals 
of the functional (41). Then, to prove that the conditions (49) define a 
field of the functional (47), it is only necessary to verify that they are self- 
adjoint at a single (arbitrary) point in [a, b}. 


According to Definition 1, the boundary conditions (49) are self-adjoint 
if there exists a function g(x, y) such that 


Pilx, y, Y(x, y)] = gy(x, y) (i= LD yscriue'y n) (50) 


fora <x <b. We now ask the following question: What condition has 
to be imposed on the function g(x, y) in order for the boundary conditions 
(49), defined by the relations (50), to be not only self-adjoint, but also 
consistent, at every point of [a, 6], i.e., for the boundary conditions (49) 
to be a field of the functional (47)? The answer is given by 


THEOREM 4. The boundary conditions (49) defined by the relations (50) 
ere consistent if and only if the function g(x, y) satisfies the Hamilton- 
Jacobi equation’? 

og 


= + H( xy: 


og og\ 
Ox ) i 


sa Fne QD aes ee BO 51 
ms ay, OY ( ) 


12 Cf, equation (72), p. 90. 
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Proof. It follows from (50) that the Hamilton-Jacobi equation (51) 
can be written in the form 


a 
= = — A(X, ya, - ++) Vn» Pas ++ +s Pr)s (52) 


where p, = p,[x, y, ((x, y)]. Differentiating (52) with respect to y, 
we obtain 


0g — _ Ox, Vr, - +s Yas VilX Y)s - YalX YI 
Ox Oy; oy, 
i.e., 
op, = _ OFX, Vis -- +> Yrs yal, Y)y ++» bal Y)] 
ox éy, 


which is just the set of consistency conditions (39). 


Remark. The connection between the Hamilton-Jacobi system intro- 
duced in Sec. 31 and the Hamilton-Jacobi equation introduced in Sec. 23 is 
now apparent. As we saw in Sec. 3], in the case of an arbitrary system of 
n second-order differential equations, a field is a system of 7 first-order 
differential equations of the form (49), where the functions y,(x, y) satisfy 
the Hamilton-Jacobi system (8). When we deal with the field of a functional, 
the system (8) turns into the consistency conditions (39), and in this case, 
we impose the additional requirement that the boundary conditions defining 
the field be self-adjoint at every point. This means that the field of a 
functional is not really determined by x» functions (x, y), but rather by 
a single function g(x, y) from which the functions (x, y) are derived by using 
the relations (50). In other words, the function g(x, y) is a kind of potential 
for the field of a functional. Since the field of a functional is determined by 
a single function, instead of by n functions, it is entirely natural that the set 
of n consistency conditions for such a field should reduce to a single equation, 
i.e., that the Hamilton-Jacobi system should be replaced by the Hamilton- 
Jacobi equation. 


32.3. Once more, we consider a functional 


[, Fon yy) dx (53) 


whose extremals are curves in the (# + 1)-dimensional space of points 
(x,y) = (X, ¥i,---; Yn)» Let R be a simply connected region in this space, 
and let c = (co, ¢i,..., Cn) be a point lying outside R. 
DEFINITION 4. Let (x, y) be an arbitrary point of R, and suppose that 
one and only one extremal of the functional (53) leaves c and passes 
through (x, y), thereby defining a direction 


y= v(x, y) (i= 1,...,”) (54) 
at every point of R. Then the field of directions (54) is called a central field. 
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THEOREM 5. Every central field (54) is a field of the functional (53), 
i.e., satisfies the consistency and self-adjointness conditions. 


Proof. Consider the function 


a(x») = f°" Fou» ») ds (55) 


where the integral is taken along the extremal of (53) joining the point 
c to the point (x, y). We define a field of directions in R by setting 
FyiQy YY’) = PY Y') = Bu Y) = 1,-0.6) 


The theorem will be proved if it can be shown that this field coincides with 
the original field (54), since then the original field will satisfy the consis- 
tency conditions [since its trajectories are extremals] and also the self- 
adjointness conditions [this follows from Theorem | applied to the field 
defined by (56)]. But (55) is just the function S(x, y),..., Yn) of Sec. 23, 
and hence 


By,(x, y) = Dix, y z), 


where z denotes the slope of the extremal joining c to (x, y), evaluated 
at (x, y).‘4 This shows that the field of directions (56) actually coincides 
with the original field (54). 


DEFINITION 5. Given an extremal y of the functional (53), suppose there 
exists a simply connected (open) region R containing y such that 


1. A field of the functional (53) covers R, i.e., is defined at every point 
of R; 
2. One of the trajectories of the field is y. 
Then we say thaty can be imbedded ina field [of the functional (53)]. 
THEOREM 6. Let y be an extremal of the functional (53), with equation 
y=WX) (as x< d), 
in vector form. Moreover, suppose that 
det | F, UVic I 


is nonvanishing in [a, b], and that no points conjugate to (a, y(a)) lie on y. 
Then y can be imbedded in a field. 


Proof. By hypothesis, the following two conditions are satisfied for 
sufficiently small e > 0: 


1. The extremal y can be extended onto the whole interval [a — «, 5]; 


2. The interval [a — «, 6] contains no points conjugate to a (cf. foot- 
note 20, p. 121). 


13 See the second of the formulas (70) and footnote 18, p. 90. 
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Now consider the family of extremals leaving the point (a — e, y(a — ¢)). 
Since there are no points conjugate to a — e in the interval [a — e, 5], it 
follows that for a < x < 56 no two extremals in this family which are 
sufficiently close to the original extremal y can intersect. Thus, in some 
region R containing y, the extremals sufficiently close to y define a central 
field in which y is imbedded. The proof is now completed by using 
Theorem 5. 


33. Hilbert’s Invariant Integral 


As before, let R be a simply connected region in the (n + 1)-dimensional 
space of points (x, y) = (x, yi...-, Yn), and let 


yi = pilX, y) G= l,...,”) (57) 
define a field of the functional 
ob 
i F(x, y, Y) dx (58) 


in R. It was proved in the preceding section (see Theorem 2) that the field 
of directions (57) is a field of the functional (58) if and only if the functions 
vi(x, y) satisfy the self-adjointness conditions 


CPx, y> vx, y)) ae cpxlx, y Ux, y)) (59) 
CVs, oy; 
and the consistency conditions 
CHIx,y, YY) _ ep ys OGY). (60) 
Cy, Cx 


Taken together, the conditions (59) and (60) imply that the quantity 


—H{x,y, Ux, yl dx + > pbx,» YO, yl ay; 
i=l 
is the exact differential of some function (see footnote 9, p. 139) 


a(x, y) a (XV, se «> Va) 


As is familiar from elementary analysis,'* this function, which is determined 
to within an additive constant, can be written as a line integral 


g(x,y) = [(-Has + > P dy,), (61) 


evaluated along the curve I" going from some fixed point My = (Xo, y(X9)) to 
the variable point M = (x, y). Since the integrand of (61) is an exact 


14 See e.g., D. V. Widder, op. cit., Theorem 12, p. 251. 
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differential, the choice of the curve [ does not matter; in fact, the value of the 
integral depends only on the points My, M,, and not on the curve I. The 
right-hand side of (61) is known as Hilbert’s invariant integral. 

Using the equations (57) defining the field, and explicitly introducing 
the integrand F of the functional! (58), we can write the integral in (61) as 


J, ({F ls 2 905 901 — 3 ee DFuls » Hen wl} de 


: (62) 
+ > Fils x 4s 91 dy): 


This expression is Hilbert’s invariant integral, in the form corresponding 
to the field defined by the functions (x, y). If the curve I’ along which the 
integral (62) is evaluated is one of the trajectories of the field, then 


dy, = yi(x, y) dx 


along I’, and hence (62) reduces to 
[FO y y) dx 


evaluated along this trajectory. 


Remark. If y is an extremal which is a trajectory of the field, Hilbert’s 
invariant integral can be used to write the value of the functional for this 
extremal as an integral evaluated along any curve joining the end points of y. 
This important fact will be used in the next section. 


34. The Weierstrass E-Function. Sufficient Conditions for a 
Strong Extremum 


DEFINITION. By the Weierstrass E-function of the functional” 


JW= [FG y2ds W@=4, WH=B 6) 


we mean the following function of 3n + | variables: 


E(x, ¥, 2,8) = F(x. 90) ~ FO%9,2) — > (wr ~ 2F 0s.) (64) 


In other words, E(x, y, z, w) is the difference between the value of the 


18 Here y(a) = A means y,(a) = Ai,..., ya(a) = An, and similarly for )»(b) = B, 
i.e., we are dealing with the fixed end point problem. 
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function F (regarded as a function of its last n arguments) at the point w and 
the first two terms of its Taylor’s series expansion about the point z. Thus, 
E(x, y, z, w) can also be written as the remainder of a Taylor’s series: 


1 


E(x, V3 2; w) = D) (, i Z)(Wy 2 ZaF yiyi LX, yz + O(w = z)] 


1 
(0< 8 < 1). 


iMs 


4, 


For n = 1, the Weierstrass E-function has a simple geometric interpretation, 
since if we regard F(x, y, z) as a function of z, 


F(x, J; w) ee F(x, y, z) =, (w a z)F,(x, y> Zz) 


is just the vertical distance from the curve I’ representing F(x, y, z) to the 
tangent to I drawn through a fixed point of I’. 

Our goal in this section is to derive sufficient conditions for the functional 
(63) to have a strong extremum. It will be recalled from Secs. 28 and 29 
that the following set of conditions is sufficient for the functional (63) to have 
a weak minimum?® for the admissible curve y: 


Condition 1. The curve y is an extremal; 


Condition 2. The matrix || F,,;|| is positive definite along y; 


Condition 3. The interval [a, b] contains no points conjugate to a. 


Every strong extremum is simultaneously a weak extremum, but the 
converse is in general false (see p. 13). Therefore, in looking for sufficient 
conditions for a strong extremum, it is natural to assume from the outset 
that the three conditions just listed are satisfied. We then try to supplement 
them in such a way as to obtain a set of conditions guaranteeing a strong 
extremum as well as a weak extremum. To find such supplementary con- 
ditions, we first recall that Conditions 2 and 3 imply that the given extremal y 
can be imbedded in a field 


yi cael d(x, y) (i= l,...,”) (65) 
of the functional (63) [see Theorem 6 of Sec. 32].17 Let y have the equations 
Yi = W(x) (i= 1,...,7), 


and let y* be an arbitrary curve with the same end points as y, lying in the 
(n + 1)-dimensional region R containing y and covered by the field (see 


16 To be explicit, we consider only conditions for a minimum. To obtain conditions 
for a maximum, we need only reverse the directions of all inequalities. 

17 The only part of Condition 2 that is used here is the fact that det ||Fyj;|| is non- 
vanishing (in fact, positive) in [a, 5]. 
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Definition 5 of Sec. 32). Then, according to equation (62) and the remark 
at the end of Sec. 33, we have 


[ Fonny) = [ (Ren) — SuriGar war 3 Fils yd by) 
(6) 


where for simplicity we omit the arguments of the functions ¥ andv,. The 
right-hand side of (66) is just Hilbert’s invariant integral, in the form corre- 
sponding to the field (65). As usual, we are interested in the increment 


Ay = | F(x, y, "dx — [ F(x, yy’) dx. 
y* Y 
Using (66), we find that 


AJ = is F(x, y, ’) dx 


a I, ({ (x,y, ¥) — > PRO Y, hax + > Fy(x, y, ¥) dy) 


[., (Fess.9) - Foowy — > 01 - WR x, w)) dr, 
i=1 
or in terms of the Weierstrass E-function.?® 
AJ = I E(x, y, ¥, »’) dx. (67) 
y* 


We are now in a position to state sufficient conditions for a strong 
extremum. 
THEOREM |. Let y be an extremal, and let 
W=40,y) G=1,...,0) (68) 
be a field of the functional 


Jol =f Fyre, y= 4, y= B69) 


Suppose that at every point (x, y) = (x, Vi,.--, Yn) of some (open) region 
containing y and covered by the field (68),'°® the condition 


E(x, y, 4, «) 2 0 (70) 


is satisfied for every finite vector w = (,,...,,). Then J[y] has a 
Strong minimum for the extremal y. 


18 More explicitly, 
AJ = f° E(x, y*, ¥, y*’) dx, 


where y, = y*(x) are the equations of the curve y*. 
18 By hypothesis, such a region R exists. 
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Proof. To say that the functional J[y] has a strong minimum for the 
extremal y means that AJ is nonnegative for any admissible curve y* 
which is sufficiently close to y in the norm of the space @(a, 5). But the 
condition (70) guarantees that the increment AJ, given by (67), is non- 
negative for allsuchcurves. Notethat we do not impose any restrictions 
at all on the slope of the curve y*, i.e., y* need not be close to y in the 
norm of the space Z,(a, b). In fact, y* need not even belong to, (a, b).?° 


Remark 1. As already noted, the hypothesis that the extremal y can be 
imbedded in a field can be replaced by Conditions 2 and 3. 


Remark 2. Since the Weierstrass £-function can be written in the form 
1 n 
ES 9.4) = 5 > Or — We — Win boy, Y + Ow — WI 
k=1 
(0<0< 1) 


(see p. 147), we can replace (70) by the condition that at every point of some 
region R containing y, the matrix ||F,-,,(x, y, z)|| be nonnegative definite 
for every finite z. 

We conclude this section by indicating the following necessary condition 
for a strong extremum: 


i, 


THEOREM 2 ( Weierstrass’ necessary condition). If the functional 


b 
ID)=[Fayyd Wa = 4, 0) = 8B 
has a strong minimum for the extremal y, then 
E(x, y, y’', w) 2 0 (71) 
along y for every finite w. 


The idea of the proof is the following: If (71) is not satisfied, there exists 
a point € in [a, 6) and a vector g such that 


ETE, v(S), (6), 4] < 9, (72) 


where y = y(x) is the equation of the extremal y. It can then be shown that 
a suitable modification of y leads to an admissible curve y* close to y in 
the norm of the space @(a, 6) such that 


AJ = [ Feu yy) dx — it F(x, y,y') dx < 0, (73) 


which contradicts the hypothesis the /[y] has a strong minimum for y. 
However, the construction of y* must be carried out carefully, since all we 
know is that (72) holds for a suitable g (see Probs. 9 and 10). 


20 In problems involving strong extrema of the functional (69), we allow broken 
extremals, i.e., the admissible curves need only be piecewise smooth (and satisfy the 
boundary conditions). 
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PROBLEMS 


1. Find the curve joining the points (—1, —1) and (1, 1) which minimizes 
the functional 


1 
J{y) = lig (x?y’2 + 12y?) dx. 
What is the nature of the minimum? 
1 
Hint, OJ = Jly +h) — Jb) = ie (x?h’? 4 12h?) dx > 0. 


Ans. J{y] has a strong minimum for y = x°. 


2. Find the curve joining the points (1, 3) and (2, 5) which minimizes the func- 
tional 


2 
J{y) = if yd + xy’) dx. 
What is the nature of the minimum? 
Hint. Again calculate AJ. 


3. Prove that the segment of the x-axis joining x = 0 to x = x corresponds to 
a weak minimum but not a strong minimum of the functional 


Jiyl = [70 - 7) dx, 90) = 0, ym) = 0. 
Hint. Calculate J[y] for 


y= gi sin nx. 


Vn 
4. Prove that the extrema of the functional 
b ———y 
I n(x, y)V1 + y? dx 
are always strong minima if n(x, y) > 0 for all x and ». 


5. Investigate the extrema of the following functionals: 
2 
a) JbI=f ydt+ey)dx, A=) =1, 9Q)= 15 


b) Jiy = J" Gy? - y? + By) dx, (0) = - 1, (e/A) = 0; 
o) Jiy] = [° Gty? + 12") dx, 1) = 1, 2) = 83 


1 
d) Jiy) = [Oo + y? + 2ye%) de, (0) = 4, 1) = He? 
Ans. b) Astrong maximum for y = sin 2x — 1; d) A strong minimum for 
y= 4e?7, 
6. Prove that y = bx/a is a weak minimum but not a strong minimum of the 
functional 
Ji = [Py de, 


where y(0) = 0, x(a) = b,a > 0,b > 0. 
Hint, Examine the corresponding Weierstrass E-function. 
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7. Show that the extremals which give weak minima in Chap. 5, Prob. 10 
do not give strong minima. 


8. Show that the extremal y = 0 of the functional 


Jly] = i (ay? — 4byy® + 2bxy"*) dx, 
where 

yO) = 0, p1)=0, a>O0, b>0, 
satisfies both the strengthened Legendre condition and Weierstrass’ necessary 
condition. Also verify that y = 0 can be imbedded in a field of the func- 


tional J[y]. Does y = 0 correspond to a strong minimum of J[y]? 
Hint. Choose 


ex for O<x<h, 
y = Yo(x) = 
k Dm for h<x<l 
1 rz; h = SS . 
Then, given any k > O however small, there isan fA > 0 such that J[yo] < 0. 


Ans. No. 


9. Complete the proof of Weierstrass’ necessary condition, begun on p. 149. 

Hint. By continuity of the E-function, we can always arrange for the point 
— to be an interior point of [a,b]. Choose A > 0 such that & — h > a, and 
construct the function 


ywx)t+(e-—aQ for a<x<& 
y= Y(x) = (x — 8a + WE) for E-hAex<§& 
yx) for &E<x<b 


where » = y(x) is the equation of the extremal y, and Q is the vector deter- 
mined by the condition 

WE -— A) + (E-a—h)Q = —-qh + y(E). 
Then let A(h) = J[y,] — J[y]. Prove that A’(0) = E[E, (&), y’(&), gq] < 0, 
which, together with A(O) = 0, implies that J[y,] — J[y] < 0 for small 
enough A. 


10. Give another proof of Weierstrass’ necessary condition, based on the 
direct use of Hilbert’s invariant integral. 


Hint. Let M, be the point (&, y(€)). From a point Mp on y sufficiently 
close to M, construct a central field of the functional. Let R be the region 
covered by this field, and let ®(/) be the value of Hilbert’s invariant integral 
evaluated along any curve in R joining Mo to the variable point M in R. 
Draw two surfaces o, and oc, of the one-parameter family ®(M) = const, 
the first intersecting y in a point M2 lying between My) and M,, the second 
intersecting y in the point M,. Moreover, from M, draw the straight line 
with direction qg, and let this line intersect o2 in a point M3. Finally, let y* 
be obtained from y by replacing the part of y from Moy to M, by the curve 
M.M3M,, where MoMsz is the extremal from My, to Mz and M3M, is the 
Straight line segment from M, to M,. Again using Hilbert’s invariant 
integral, prove that y* satisfies the inequality (72). 
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VARIATIONAL PROBLEMS 
INVOLVING 
MULTIPLE INTEGRALS 


In this chapter, we discuss a variety of topics pertaining to functionals 
which depend on functions of two or more variables. Such functionals 
arise, for example, in mechanical problems involving systems with infinitely 
many degrees of freedom (strings, membranes, etc.). In our treatment of 
systems consisting of a finite number of particles (see Chapter 4), we derived 
the principle of least action and a general method for obtaining conservation 
laws (Noether’s theorem). These methods will now be applied to systems 
with infinitely many degrees of freedom. 


35. Variation of a Functional Defined on a Fixed Region 


Consider the functional 


TU] = foe [Fay ey Kas My Mage ey Mag) Oa 2+ Ben (1) 
depending on n independent variables x,,...,x,, an unknown function u 
of these variables, and the partial derivatives u,,,...,u,, Of u. (As usual, 


it is assumed that the integrand F has continuous first and second derivatives 
with respect to all its arguments.) We now calculate the variation of (1), 
assuming that the region R stays fixed, while the function u(x,..., X,) 
goes into 


u*(Xq,.. 2, Xn) = U(X... Xn) + EVOL, Xn) toes (2) 
152 


SEC. 35 VARIATIONAL PROBLEMS INVOLVING MULTIPLE INTEGRALS 153 


where the dots denote terms of order higher than | relative toe. By the 
variation 8J of the functional (1), corresponding to the transformation (2), 
we mean the principal linear part (in e) of the difference 


J(u*] — J[u). 


For simplicity, we write u(x), (x) instead of u(x,,..., Xn), V0%1,..-, Xa), 
dx instead of dx, --- dx,, etc. Then, using Taylor’s theorem, we find that 


J(u*] — J[u] = [ {F Lx, u(x) + eb(x), ur, (x) + es, (x), .- + Uz, (X) + edz, (x) 
— F[x, u(x), uz, (x), .--, uz, (x)]} dx 


=e[ (F, =f » Fad) ax + = 


where the dots again denote terms of order higher than | relative toe. It 
follows that 


ayae) (A+ DF snide) dx (3) 


is the variation of the functional (1). 
Next, we try to represent the variation of the functional (1) as an integral 


of an expression of the form 

G(x)Y(x) + div (++), 
i.e., we try to transform the expression (3) in such a way that the derivatives 
4,, only appear in a combination of terms which can be written as a diver- 
gence. To achieve this, we replace 


Fy, 41,(x) 
by 


ei 


zl Fu, 4O)] — Ge HO) 


in (3), obtaining 


as=ef (F- 


This expression for the variation 3J/ has the important feature that its second 
term is the integral of a divergence, and hence can be reduced to an integral 
over the boundary I’ of the region R. In fact, let do be the area of a variable 
element of I, regarded as an (# — 1)-dimensional surface. Then the 
n-dimensional version of Green’s theorem states that 


n 


+4 Fu,J¥e) dx bef > eR gldr 4) 


i iz 


I, de [Fu.$@)] dx = | YONG, v) do, (5) 


where 
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is the n-dimensional vector whose components are the derivatives Fy,,, 


v = (y,...,%,) is the unit outward normal to I, and (G, v) denotes the 
scalar product of Gand v. Using (5), we can write (4) in the form 


SJ =e [ (F. = > & Fu.) 40) dx +e i U(x\(G, v) do, (6) 


where the integral over R no longer involves the derivatives of (x). 

In order for the functional (1) to have an extremum, we must require 
that 8J = Ofor all admissible (x), in particular, that 87 = Ofor all admissible 
(x) which vanish on the boundary I’. For such functions, (6) reduces to 


f. 8 
iJ = {. (FA - 2 ax, F.,.) 42) dx, 
and then, because of the arbitrariness of (x) inside R, 57 = 0 implies that 


ns) 
Fa Do hua = (7) 


for all xe R. This is the Euler equation of the functional (1), and is the 
n-dimensional generalization of formula (24) of Sec. 5.! 


Remark. In deriving (7), we assumed that the region of integration R 
appearing in the functional (1) is fixed. Generalization of (7) to the case 
where the region of integration is variable will be made in Sec. 36. 


36. Variational Derivation of the Equations of Motion of 
Continuous Mechanical Systems 


As we saw in Sec. 21, the equations of motion of a mechanical system 
consisting of n particles can be derived from the principle of least action, 
which states that the actual trajectory of the system in phase space mini- 
mizes the action functional 


[ : (T — U) dt, (8) 


where T is the kinetic energy and U the potential energy of the system of 
particles. We now use this principle, together with our basic formula for 
the first variation, to derive the equations of motion and the appropriate 
boundary conditions for some simple mechanical systems with infinitely 
many degrees of freedom, namely, the vibrating string, membrane and plate. 


1 As we shall see in the next section, boundary conditions for the equation (7) can be 
obtained by removing the restriction that y(x) = 0 on IT, and then setting 5/ = 0 after 
substitution of (7) into (4) or (6). 
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36.1. The vibrating string. Consider the transverse motion of a string 
(i.e., a homogeneous flexible cord) of length 7 and linear mass density p. 
Suppose the ends of the string (at x = 0 and x = /) are fastened elastically, 
which means that if either end is displaced from its equilibrium position, 
a restoring force proportional to the displacement appears. This can be 
achieved, for example, by fastening the ends of the string to two rings which 
are constrained to move along two parallel rods, 
while the rings themselves are held in their initial 
positions by two ideal springs,” as shown in Fig. 8. 
Let the equilibrium position of the string lie 
along the x-axis, and let u(x, t) denote the dis- 
placement of the string at the point x and time 
t from its equilibrium position. Then, at time /, 
the kinetic energy of the element of string 
which initially lies between xp and x, + Ax is 
clearly 


* 
HT] 
2) 
»* 
it} 
~ 


4 puP(xo, t)Ax. (9) Ficure 8 


Integrating (9) from 0 to /, we find that the kinetic energy of the whole string 
at time f equals 


| 1 
T= 350 ip u2(x, t) dx. (10) 


To find the potential energy of the string, we use the following argument: 
The potential energy of the string in the position described by the function 
u(x,t), where ¢ is fixed, is just the work required to move the string from 
its equilibrium position u = 0 into the given position u(x, ft). Let t denote 
the tension in the spring, and consider the element of string indicated by AB 
in Figure 9, which initially occupies the position DE along the x-axis, i.e., 
the interval [x 9, x» + Ax].? Tocalculate the amount of work needed to move 
DE to AB, we first move DE to the position AC. This requires no work at 
all, since the force (the tension in the string) is perpendicular to the dis- 
placement.* Next, we stretch the string from the position AC to the position 
AC’, where the length of AC’ equals the length of AB. This obviously 
requires an amount of work equal to t8, where isthe length of CC’. Finally, 
we rotate AC’ about the point A into the final position AB. Like the first 
step, this requires no work at all, since at each stage of the rotation the 
force is perpendicular to the displacement. Thus, the total amount of work 


2 The springs are ideal in the sense that they have zero length when not stretched. 

3 Since we only consider the case of small vibrations, the string can be assumed to have 
constant length and constant tension. In the present approximation, we can also assume 
that AB is a straight line segment. 

4 It should be emphasized that since the string is assumed to be absolutely flexible, 
all the work is expended in stretching the string, and none in bending it. 
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required to move DE to AB is just the product of + and the increase in 
length of the element of string, i.e., the quantity 


Se 2 
(Ax)? + (Ax 5(=) Re sisietes 5 UAC %o, Ax ++, 


Ax 
(11) 


where the dots indicate terms of order higher than those written (Au/Ax « 1 
for all 4, since the vibrations are small). 


Xo + Ax 
FiGure 9 


Integrating (11) from 0 to /, we find that the potential energy of the whole 
string is 


] t 
Ur =57 ib u2(x, t) dx, (12) 


except for the work expended in displacing the elastically fastened ends of 
the string from their equilibrium positions. This work equals 


] 
Us = 5 >a0%(0, 1) + 5 raul 1, i) 


where x, and x, are positive constants (the elastic moduli of the springs). 
[In fact, the force /, acting on the end point P, (see Figure 8) is proportional 
to the displacement & of P, from its equilibrium position x = 0, u = 0, ie., 

fl = 48, (14) 
where x, > 0 is a constant; integration of (14) shows that the work required 
to move P, from (0, 0) to (0, u(0, ¢)), its position at time f, is given by 


pu(d, t) | 

[8 dé = 5x10, 0), 

0 2 
and similarly for the otherend point P,.}_ Then, adding (12) and (13), we find 
that the total potential energy of the string in the position described by 


the function u(x, fr) is 


ol 
U=U, + U2, = se u2(x, t) dx + L ueO: t) + eer tr). (15) 
2 Jo 2 2 
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Finally, using (10) and (15), we write the action (8) for the vibrating string, 
obtaining the functional 
= 1 eat. 2 2 
Thu} = 5 | |, Leuba, 1) — cullx, 1] dx at 


] 


ae ee ore (16) 
— 34 | WO, 1) dt — 5 m2 | u(I, t) dt. 


According to the principle of least action, 8/ must vanish for the function 
u(x,t) which describes the actual motion of the string. Thus, we now 
calculate the variation 35/ of the functional (16). Suppose we go from the 
function u(x, f) to the “‘varied”’ function 


u*(x, t) = u(x, t) + eb(x, 1) + --- 


Then, using formula (4) and the fact that the variation of a sum equals the 
sums of the variations of the separate terms, we find that 


SJ = e{{ pul, 0 + suas, DIV, 1 dx dt 
oer I u(0, t) (0, t) dt — x, I : u(l, tyb(2, t) ar 
+ ef. i as [—rus(x, t) U(x, O] dx dt 
+e is fe < [ou(x, t)Y(x, t)] dx dt. 
If we assume that the admissible functions (x, 1) are such that 


V(x, fo) = 0, v(x, 4) =0 (0 gx< !), 


ie., that u(x, ¢) is not varied at the initial and final times, then the last term 
in (17) vanishes and the next to the last term reduces to 


cf Feu(0, 900, 9) — sus(l, NYU, O] at 
It follows that the variation (17) can be written in the form 
By = ef J : i [—pmy + tued(x, N(x, t) dx de 
— [* pew, 1) = =H,(0, N14, #) dt (18) 
= is beau(l, #) + sua(l, IMC, 1) at} 


According to the principle of least action, the expression (18) must vanish 
for the function u(x, ¢) corresponding to the actual motion of the string. 
Suppose first that {(x, t) vanishes at the end of the string,® i.e., that 


40,1) =0, Whth=0 (9 <t <n). (19) 


(17) 


5 If 37 vanishes for all admissible Y(x, ), it certainly vanishes for all admissible y(x, ¢) 
satisfying the extra condition (19). 


158 VARIATIONAL PROBLEMS INVOLVING MULTIPLE INTEGRALS CHAP. 7 


Then (18) reduces to just 
ty fl 
aye] {Lema 2) + tuse(x, O1Kx dxdt 20) 


Setting (20) equal to zero, and using the arbitrariness of the interval [to, f,] 
and of the function $(x, 1) for 0 < x < 1, tp < t < t, (cf. the lemma of 
Sec. 5), we find that 


uiy(2, £) = ate, 8) (@ = *) (21) 


for 0 <x </and all t. This result, called the equation of the vibrating 
string, is the Euler equation of the functional 


: { i [uP(x, 0) — u2(x, t)] dx dt. 


Next, we remove the restriction (19). Since u(x, t) must satisfy (21), the 
first term in (18) vanishes, and we have 


SJ = ~ef f * tye u(O, t) — 2w,(0, 1)])(0, ¢) at 
+[ * Teall, £) + cull, OG, 2) ar\. (22) 


This expression must also vanish for the function u(x, f) corresponding to the 
actual motion of the string. Since [to, t,] is arbitrary and (0, £), Y(/, t) are 
arbitrary admissible functions, equating (22) to zero leads to the relations 


x,u(0, t) — tu,(0, 1) = 0 (23) 
and 
xou(/, t) + tu,(l, t) = 0 (24) 


for all t. Thus, finally, the function u(x, ft) which describes the oscillations 
of the string must satisfy (21) and the boundary conditions 


au(0, t) + u,(0, 1) = 0 (« - -*1) (25) 
and 


Bull, t) a ull, t) =0 ( = “2), (26) 


which connect the displacement from equilibrium and the direction of the 
tangent at each end of the string. 

Next, suppose the ends of the string are free, which means that the springs 
shown in Fig. 8 are absent and the rings fastening the string to the lines 
x = 0, x = 7 can move up and down freely. Then x; = x2 = 0, and the 
boundary conditions (23), (24) become 


u,(0, t) = 0, ui, t) = 0. 
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Thus, at a free end point, the tangent to the string always preserves the same 
slope (zero) as it had in the equilibrium position. 

The case where the ends of the string are fixed, corresponding to the 
boundary conditions 


u(0, t) = 0, u(/, t) = 0, (27) 


can be regarded as a limit of the case of elastically fastened ends. In fact, 
let the stiffness of the springs binding the ends of the string to their initial 
positions increase without limit, i.e., let x; > 00, x,» oo. Then, dividing 
(23) by x, and (24) by x2, and taking this limit, we obtain the conditions (27). 


36.2. Least action ys. stationary action. The principle of least action is 
widely used not only in mechanics, but also in other branches of physics, 
e.g., in electrodynamics and field theory. However, as already noted (see 
Remark 2, p. 85), in a certain sense the principle is not quite true. For 
example, consider a simple harmonic oscillator, i.e., a particle of mass m 
oscillating about an equilibrium position under the action of an elastic 
restoring force (cf. Chap. 4, Prob. 2). The equation of motion of the par- 
ticle is 

mx + xx = 0, (28) 
with solution 
x = Csin (wt + 9), (29) 


AE 
o= > 
m 


and the values of the constants C, 8 are determined from the initial conditions. 
Moreover, the particle has kinetic energy 


where 


T = tmx? 
and potential energy 
U = hx’, 
so that the action is 
1 pts 
5 i) (mx? — xx?) dt. (30) 
to 


Equation (28) is the Euler equation of the functional (30), but in general 
we cannot assert that its solution (29) actually minimizes (30). In fact, 
consider the solution 


1. 
x = & sin wt, (31) 


which passes through the point x = 0, ¢ = 0 and satisfies the condition 
x(0) = 1. The point (x/w, 0) is conjugate to the point (0,0), since every 
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extremal satisfying condition x(0) = 0 intersects the extremal (31) at (s/w, 0) 
[see p. 114]. Since 


Fzn=m>0 


for the functional (30), the extremal (31) satisfies the sufficient conditions 
for a minimum (in fact, a strong minimum), provided that 


wT 
O<t<th<— 
q@ 


However, if we consider time intervals greater than m/w, we can no longer 
guarantee that the extremal (31) minimizes the functional (30). 
Next, consider a system of n coupled oscillators, with kinetic energy 


T= > Gem, (32) 
k= 
(a quadratic form in the velocities x,;) and potential energy 


U= iXiX (33) 


kal 


i] 


(a quadratic form in the coordinates x,). The quadratic form (32) is positive 
definite (since it is a kinetic energy); therefore, (32) and (33) can be simul- 
taneously reduced to sums of squares by a suitable linear transformation® 


x= > Cik Vk (i= 1,...,m), (34) 
Ka) 


i.e., substitution of (34) into (32) and (33) gives 
T=) 4, U= > ra? 
i=l 
Then the equations of motion of the system of oscillators are given by the 
Euler equations 
d (oT ou 50 ; 
saa) Biggs dee gen (i= 1,..., 7), (35) 


corresponding to the action functional 


By ae 2 
J, 2@ — Na?) dt 


® See e.g., G. E. Shilov, op. cit., Secs. 72 and 73. The coordinates q are often called 
normal coordinates, and the corresponding frequencies «, are called natural frequencies. 
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Suppose all the 4, are positive, which means that we are considering 
oscillations of the system about a position of stable equilibrium. Then 
the solution of the system (35) has the form 


q = C,sinw(t + 6) (= 1,...,2), (36) 
where 
oy = VA, 


and the values of the constants C,, 8, are determined from the initial con- 
ditions. An argument like that made for the simple harmonic oscillator 
(n = 1) shows that a trajectory of the system [i.e., a curve given by (36) 
in a space of n + 1 dimensions] whose projection on the time axis is of length 
no greater than 7/w, where 
® = max wo, 
l<ign 

contains no conjugate points and satisfies the sufficient conditions for a 
minimum. However, just as before, we cannot guarantee that a trajectory 
whose projection on the time axis is of length greater than 7/w actually 
minimizes the action. 

Finally, consider a vibrating string of length / with fixed ends.’ As shown 
above, the function u(x, t) describing the oscillations of the string satisfies 
the equation 


Un (x, t) = a?u,,(x, t) 
and the boundary conditions 
u(0, t) = 0, u(/, t) = 0. 


It follows that® 


u(x, t) = »; C,(x) sin w,(t + 9,,), 
k=1 
where 


oO. => (37) 


and C,(x), 9, are determined from the initial conditions. Thus, in a certain 
sense, a vibrating string can be regarded as a system of infinitely many 
coupled oscillators, with natural frequencies (37). However, the numbers 
(37) have no finite upper bound, and hence the analogy with the case of n 
coupled oscillators leads us to believe that for a vibrating string, there is no 


7 Unlike the analysis of a system of 7 oscillators, the elementary argument that 
follows is meant to be heuristic rather than rigorous. 

8 See e.g., G. P. Tolstov, Fourier Series, translated by R. A. Silverman, Prentice-Hall, 
Inc., Englewood Cliffs, N. J. (1962), p. 271. 
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time interval short enough to guarantee that u(x, t) actually minimizes 
the action functional. Similar arguments can be carried out for other 
systems with infinitely many degrees of freedom. 

Guided by the above considerations, we shall henceforth replace the 
principle of /east action by the principle of stationary action. In other words, 
the actual trajectory of a given mechanical system will not be required to 
minimize the action but only to cause its first variation to vanish. 


36.3. The vibrating membrane. Consider the transverse motion of a 
membrane (i.e., a homogeneous flexible sheet) of surface mass density p. 
Let u(x, y, tf) denote the displacement from equilibrium of the point (x, y) of 
the membrane, at time t, The kinetic energy of the membrane at time ¢ is 
given by 


a 1 2 
T= 50 || wx» dx dy, (38) 


where R is the region of the xy-plane occupied by the membrane at rest. 
The potential energy of the membrane in the position described by the 
function u(x, y, t), where ¢ is fixed, is just the work required to move the 
membrane from its equilibrium position u = 0 into the given position 
u(x, y, t). This work is the sum of the work U, expended in deforming the 
membrane and the work U, expended in moving the boundary of the mem- 
brane, which we assume to be elastically fastened to its equilibrium position. 

To calculate U,, let t denote the tension in the membrane, and consider the 
elementAA of the membrane initially occupying the region x) < x < X) + Ax, 
Yo SY < Yo + Ay. Then, just as in the case of the string, the work needed 
to deform AA equals the product of t and the increase in the area of AA 
under deformation, i.e., 


tV (Ax)? + (Au)? V(Ay)? + (Au)? — tAx Ay 
1 Au\? Au\? 
a5 <[(X) + (=) ] ax4y (39) 
= 51%, Yo, f) + wl, Yo, N)AXAY + o>, 


where the dots indicate terms of order higher than those written. Integrating 
(39) over R, we find that the work required to deform the whole membrane is 


U, = ; t al [u2(x, y, t) + u2(x, y, t)] dx dy. (40) 
R 
To calculate U2, we generalize the argument used to derive (14). If I 


is the boundary of the region R, and s is arc length measured along [’ from 
some fixed point on I’, then 


i 5 [, outs, 0 ds, (41) 
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where is, t) is the displacement of the membrane from equilibrium at the 
point s and time ft, and x(s) is the linear density of the elastic modulus of 
the forces retaining the boundary of the membrane.® Combining (38), (40) 
and (41), we find that the action functional for the vibrating membrane is 


ty 
J[u] = (i (T — U, — U,)dt 
1 ffs 2 2, 2 
= 3, ff, eubeso ) — ahd,» 0) + Whee» OD dx dy de (42) 


ty 
—5 [6 2. 0) ase 
0 
Suppose we go from the function u(x, y, t) to the “varied” function 


u*(x, J t) = u(x, y; t) + ef(x, ys t) 5 ae 
Then, using formula (4) of Sec. 35 and dropping arguments of functions, we 
find that the variation 8J of the functional (42) is 


ay =e [ff [pu + xdes + wyy)IY dx dy dt 
-ef : [_ vaueh ds at — ex | : | iA [x (us) + 5 wy] dx dydt 


+ef : i S (wy) dx dy de (43) 


Just as in the case of the vibrating string, we assume that the function 
u(x, y, t) is not varied at the initial and final times, i.e., that 


x, Ys fo) = Y(x, ys t1) = 0. (44) 
Because of (44), the last integral in (43) vanishes. Moreover, using Green’s 
theorem in two dimensions (see p. 23), we have 


a ) 
JJ, [ee eed + = Go] dx dy = [aah dy - wh dx) 
= Ou : . (t ou. Te 
= ia [F cos $+ | ds sin (G + 9) on sin 3 « ) ds cos (5 + s)| 
Ou 
= Jind 
where 0/0n denotes differentiation with respect to n, the outward normal to 


I, and 9 is the angle between n and the x-axis. Thus, we can finally write 
(43) in the form 


ty ° 
SJ =e I. a [—puu + tuse + Uyy)]Y dx dy dt 


—e i if (x +7 xy ds dt. 


® More precisely, let the parametric equations of I’ be 


(45) 


x = x(s), y = ys), SoS s <5. 
Then «(s, tf) means u[x(s), y(s), tJ], and ‘‘the point s”” means the point (x(s), y(s)). 
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We first assume that 
YWst)=0 (sel), (46) 


where ¢ is arbitrary, i.e., that « does not vary on the boundary of the mem- 
brane. Then (45) reduces to just 


sy = [ a [, Ceme + ates + tyy)l ae dy de (47) 


Setting (47) equal to zero, and using the arbitrariness of the interval [fo, £,] 
and of the function ) = (x, y, 2) inside R x [fo, t,], we find that 


eee er Cee mee) ( S 4 (48) 


for (x, y) € R and all ¢, a result known as the equation of the vibrating mem- 
brane.1° Equation (48) can also be written as 


Un(X, ys t) = a?V7u(x, y, t), 
in terms of the Laplacian (operator) 


o a? 
Vv = aa t ae (49) 
Next, we remove the restriction (46). Since u(x, y, f) must satisfy (48), 
the first term in (45) vanishes, and we are left with 


ven is {. one ) + 2 MSD $1] Us, t) ds de. (50) 


Then, since (s, ¢) is an arbitrary admissible function, equating (50) to zero 
leads to the formula?? 


x(s)u(s, t) + + =0 (se T). (51) 


eu(s, t) 
on 
This is the boundary condition satisfied by a vibrating membrane when its 
boundary is elastically fastened to its equilibrium position. In particular, 

if the boundary of the membrane is free, x(s) = 0 and (51) becomes 


u(s, t) _ 
a =0 (s E Yr), (52) 
while if the boundary of the membrane is fixed, x(s) = o and (51) becomes 


u(s, t) = 0 (seT). (53) 


10 By R x [fo, 4:] is meant the Cartesian product of R and [fo, f,], i.e., the set of all 
points (x, y, f) where (x, y) € R and ré [fo, f,). 
1. The boundary conditions (51), (52) and (53) hold for all ¢. 
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36.4. The vibrating plate. Finally, we use the principle of stationary 
action to derive the equation of motion and the boundary conditions for the 
transverse vibrations of a plate (i.e., a homogeneous two-dimensional elastic 
body) with surface mass density p. As in the case of the vibrating membrane, 
let u(x, y, 1) denote the displacement from equilibrium of the point (x, y) of 
the plate, at time ¢t. Then the kinetic energy of the plate at time fis given by 


Lott ag 
T= 50] [ uix», 1) dx dy, (54) 


where R is the region of the xy-plane occupied by the plate at rest [cf. (38)]. 

The potential energy of deformation of the plate, which we denote by U,, 
depends on how the plate is bent, and hence involves the second derivatives 
Uzz, Uzy and u,y,. Unlike the case of the membrane, it is assumed that no 
work is done in stretching the plate, so that U, does not involve u, and u,. 
Moreover, we require U, to be a quadratic functional in u,,, u,, and uy,,>? 
which does not depend on the orientation of the coordinate system. Then, 
since the matrix 


Uz, Uzy! 


Uyz Uyy| 


has just two invariants under rotations, i.e., its trace and its determinant,® 
it follows that 


U, = ca [A(uz, + Uyy)? + Bz Uyy — uz,)] dx dy, (55) 


where A and B are constants. Equation (55) is usually written in the form 
1 2 2 2 
Ur = 56] (ues + uh.) — 21 — wlesty — uy] dx dy, (56) 


where c is a constant depending on the choice of units, and yu is an absolute 
constant (Poisson’s ratio) characterizing the material from which the plate is 
made. For simplicity, we set c = 1. 

In addition to the potential energy of deformation U;,, the total potential 
energy of the plate may also contain a contribution U, due to bending 
moments with density m(s, f), prescribed on the boundary [I of R, and a 
contribution U, due to external forces acting on R with surface density 
J (x, y, t) and on I with linear density p(s, t). This would give 


Us = [m(s, 2) MED ds (57) 


12 This guarantees that the equation of motion of the plate is linear. 
13 See e.g., G. E. Shilov, op. cit., p. 106. 
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where 0/én denotes differentiation with respect to nm, the outward normal 
to I, and? 


Us = ff for tuts» dx dy + f pls, 2) ds. (58) 


Combining (54), (56), (57) and (58), we find that the action functional for 
the vibrating plate is 


ty 
J(u] = | (T — U, — Uz — Us) dt 
0 
lpie e : ; 
=5 [. Ff, feud — eee + typ)? + 201 — wees — wh) — 2ffe] dx dy at 


-[ [ (ru +m =) ds dt. (59) 


Unlike the corresponding expressions for the vibrating string and the 
vibrating membrane, (59) contains second derivatives of the unknown 
function u. The variation of (59) corresponding to the transition from 
u(x, y, t) to 


u*(x, yt) = u(x yt) + ed(xyy, + --: 
turns out to be (see Problems 4 and 5, p. 190) 


SJ = eff, (—puy — Viu — fyb dx dy dt 


. (60) 
1 
4 ef {. G =-phb += m) =| ds dt. 
Here, 
= — [pV7u + (1 — )(Ur2xh + 2uUzyXnYn + UyyYa)] (61) 
and 
P=>— vy + (1 ~ ne 5 Wa%nXs a Uzy(%pVs + XsYn) + UyyYnys]s (62) 


where 0/én denotes differentiation in the direction of the outward normal 
to I’, with direction cosines x,, y,, and 0/@s denotes differentiation in the 
direction of the tangent to I’, with direction cosines x,, y,. Moreover, 


Otu Ou 
45, — 2 2. oe ee 
VRS wey mt 2 rage + ay 
according to (49). 
We first assume that 
a 
Ls, t) = 0, aie =0 (sel), (63) 


14 An identical term might also have been included in the expression for the potential 
energy of the vibrating membrane. 
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where ¢ is arbitrary, i.e., that w and its normal derivative do not vary on the 
boundary of the plate. Then (60) reduces to just 


BJ = ef : Ff, ema — Vou = fp dx dy de (64) 


Setting (64) equal to zero, and using the arbitrariness of the interval [fo, t,] 
and of the function ) = (x, y, t) inside R x [to, 4], we obtain the equation 
for forced vibrations of the plate:15 


puy(x, Vs t) + Viu(x, Vs t) + SO Js t) = 0. (65) 


If we set f = 0, so that there are no external forces acting on the plate, (65) 
reduces to the equation for free vibrations of the plate 


Puy (x, yt) + Viu(x, yt) =0. 


Finally, if we set u, = O in (65) and assume that f = f(x, y) is independent 
of time, we obtain an equation for the equilibrium position of the plate 
under the action of external forces: 


Viu(x, y) + f(x y) = 0. 


This equation could have been obtained directly from the condition for the 
potential energy of the plate to have a minimum (see Remark 2 below). 

Next, we remove the restriction (63). Since u(x, y, t) must satisfy (65), 
the first term in (60) vanishes, and we are left with 


as =e f ' i [ — py + (M — m) edi (66) 


Then, since the functions ), d)/0n and the interval [fo, ¢,] are arbitrary, 
equating (66) to zero leads to the natural boundary conditions 


P(s, t) a p(s, t) = 0, Ms, t) ~ m(s, t) =0 (s € Lr). (67) 


If the boundary of the plate is clamped, the conditions (67) are replaced by 
the “imposed” boundary conditions 
7 Ou(s, t) 
u(s, t) = 0, <a (se T). 
If the plate is supported, i.e., if the boundary of the plate is held fixed while 
the tangent plane at the boundary can vary, we obtain the boundary con- 
ditions 
u(s,t)} = 0, M(s,t) — m(s,t) = 0 (se T). 


18 When domains of arguments are not specified, it is understood that ¢ is arbitrary 
and (x, y)E R. 
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Remark 1. It should be noted that the Euler equation (65) does not involve 
the coefficient p. This is explained by the fact that the expression 


UzgUyy — uz, (68) 
is the divergence of the vector 
(uzUyy, ~? U;Uzy), 


and hence has no effect on (65). However, (68) does have a decisive effect 
on the boundary conditions, via the functions M(s, t) and P(s, t). 


Remark 2. For a mechanical system to be in equilibrium, its kinetic 
energy J must vanish and its potential energy U must be independent of 
time. Under these conditions, the principle of stationary action reduces to 
the assertion that SU = 0. Thus, the equilibrium position of the system 
corresponds to a stationary value of U. Moreover, it can be shown that this 
stationary value must be a minimum if the equilibrium is to be stable and 
hence physically realizable. In elasticity theory, this principle of minimum 
potential energy is often replaced by Castigliano’s principle, which states 
that the equilibrium position of an elastic body corresponds to a minimum 
of the work of deformation.® 


37. Variation of a Functional Defined on a Variable Region 


37.1. Statement of the problem. In Sec. 35, we derived a formula for the 
variation of the functional 


0) oe 


allowing only the function u (and hence its derivatives) to vary, while leaving 
the independent variables (and hence the region of integration R) unchanged. 
We now find the variation of the functional (69) in the general case where the 
independent variables x,,..., x, are varied, as well as the function uw and its 
derivatives. For simplicity, we use vector notation, writing x = (x,..., Xa), 
dx = dx, --- dx, and 


gradu = Vu = (u,,,..., Uz,)- 


With this notation, (69) becomes 
Jtu] = [F(x wu Vu) dx. (70) 
R 
16 For a detailed treatment of Castigliano’s principle and a proof of its equivalence 


to the principle of minimum potential energy, see e.g., R. Courant and D. Hilbert, 
Methods of Mathematical Physics, Vol. 1, Interscience, Inc., New York (1953), pp. 268-272. 
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Now consider the family of transformations?” 


xf = ®,(x, u, Vu; €), 


u* = V(x, u, Vu;e), Gy) 


depending on a parameter c, where the functions ®; (i = 1,...,”) and ¥ 
are differentiable with respect to c, and the value « = 0 corresponds to the 
identity transformation: 


,(x, u, Vu; 0) = Xi, 
W(x, u, Vu; 0) = u. 


The transformation (71) carries the surface o, with the equation 


u = u(x) (x € R), 


(72) 


into another surface o*. In fact, replacing u, Vu in (71) by u(x), Vu(x), 
and eliminating x from the resulting n + 1 equations, we obtain the equation 


u* = u*(x*) (x* € R*) 
for o*, where x* = (x*,..., x*), and R* is a new n-dimensional region. 
Thus, the transformation (71) carries the functional J [u(x)] into 
Jwter)] = [FO ut, Vr) det, 
R 
where 
V*u* = (uts,..., uts). 


Our goal in this section is to calculate the variation of the functional (70) 
corresponding to the transformation from x, u(x) to x*, u*(x*), i.e., the 
principal linear part (relative to ) of the difference 


J(u*(x*)] — J[uQ)]. (73) 
37.2. Calculation of 5x; and 5u. As in the proof of Noether’s theorem for 


one-dimensional regions (see p. 82), suppose €« is a small quantity. Then, 
by Taylor’s theorem, we have 


Vv . 
x¥ = Ox, u, Vu; 2) = @,(x, u, Vu; 0) + € OO u Vi 8} + ofc), 
e=0 
V. . 
u* = (x, u, Vu; e) = F(x, u, Vu 0) + ae + o(e), 
e=0 
or using (72), 
x¥ = x; + e9(x, u, Vu) + of€), (74) 


u*¥ =u + ed(x, u, Vu) 4+ o(c), 


17 These formulas, with 7 independent variables and 1 unknown function, should be 
contrasted with the formulas (45) of Sec. 20, with 7 unknown functions and 1 independent 
variable. 
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where 
0D (x, u, Vuze 
9,(x, u, Vu) = AOU ty Vu €) 
OY (x, u, Vu; €) ra >) 
POG ty NW) = | 
ce e=0 
For a given surface o, with equation u = u(x), (74) leads to the increments 
Ax, = x — x; = ep(x) + of€) (76) 
and 
Au = u*(x*) — u(x) = ef(x) + ofe), (77) 


where we explicitly indicate the arguments x and x* at which the functions 
u and u* are evaluated, and 9,(x), (x) denote the functions (75) with u, Vu 
replaced by u(x), Vu(x). Formula (77) gives an expression for the change in 
u-coordinate as we go from the point (x, u(x)) on the surface o to its image 
(x*, u*(x*)) under the transformation (74). The variations 5x, and du 
corresponding to (74) are defined as the principal linear parts (relative to e) 
of the increments (76) and (77), i.e., 


dx, = e¢,(x), du = ef(x). (78) 
We must also consider the increment 
Au = u*(x) An u(x), 
i.e., the change in u-coordinate as we go from the point (x, u(x)) to the 
point (x, u*(x)) on the surface o* with the same x-coordinate, where o* is the 
image of the surface o under the transformation (74). Imitating (77) and 
(78), we introduce a new function {(x) and a corresponding variation Su: 


Au = u*(x) — u(x) = ef(x) + o(6), 


du = e(x). 
To find the relation between } and ¢, or equivalently, between 5u and 8u, 
we write 


Au = u*(x*) — u(x) = [u*(x*) — u*(x)] + [AAG) — uO) 


n Ou* * — 
a ere (x¥* — x) + du + oe) 
ee (79) 
2. Ou* iw 
= pe Ox, 3x, + Su + ofe). 


Since éu*/dx, and du/dx, differ only by a quantity of order e, (79) becomes 
2. Ou — 
Au ~ pacts 
u 2 By, Om + Bu, 


where the symbol ~ denotes equality except for terms of order higher than 1 
relative to «. But Au ~ du, since du is the principal part of Au, and hence 


Ma 


Su = but u,, 3X; (80) 


~ 
tl 


1 
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Moreover, since 
du =e, bu = eh, dx, = eg, 
(80) also implies 


pot > Us (81) 


Example. Let u be a function of a single independent variable x, and let 
(71) be the transformation 


x* = xcose — u(x) sine = x — eu(x) + ofe), 


u*(x*) = xsine + u(x) cose = ex + u(x) + ofc), (82) 


i.e., a counterclockwise rotation of the xu-plane about the small angle « = e. 
As shown in Figure 10, (82) carries the point (x,u(x)) on the curve y with 
equation u = u(x) into the point (x*, u*(x*)) 

on its image y* with equation u* = u*(x*), * 

It follows from (82) that 


dx = —eu(x), du = ex (83) 


and 
Ax) = —u(x), fx)=x. (84) 


In fact, the expressions (83) can be read 
directly off the figure, as the components of 
the vector joining the point (x, u(x)) to the 
point (x*, u*(x*)). Moreover, Ficure 10 


u*(x) = u*[x* + eu(x)] + o(e) = u*(x*) + eu(x)u*'(x*) + of€), 
and since u*’(x*) and u’(x) differ only by a quantity of order c, we have 
u*(x) = u*(x*) + eu(x)u'(x) + o(e). 
On the other hand, according to the second of the formulas (82), 
u*(x*) = ex + u(x) + ofe). 
It follows that 


Au = u*(x) — u(x) = e[x + u'(x)u(x)] + of) 
and 
8u = e[x + u(x)n'(x)], 
Wx) = x + u(x)u'(x). 
Using (83) and (84), we can write (85) as 


(85) 


du = du + u’ 8x, 
) = p + u'g, 
in complete agreement with (80) and (81). 
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37.3. Calculation of 3u,,. We now derive an expression for the quantity 


_ Ou*(x*) u(x) 
Aus = ax* Ox, 


or more precisely, its principal part u,,, which will be required later when 
we calculate the increment (73). First, we note that according to (74),?® 


* 
ala aren (86) 


where §&,, is the Kronecker delta, equal to | if i = k and 0 otherwise. It 
follows that 


i.e., 
a 8 2. 89, @ 
Bx, — Ok ~ © 2, Ox, Oot ep 


Next we write 
_ Ou*(x*) u(x) 
Aus, = Ox¥ ax, 


= Hutt) oie) ue) — od, (8 2 Vaca, 


Ox* Ox; Ox* = 0X; 


and analyze each of the three terms in the right-hand side separately. Using 
(87) and the fact that 


u*(x*) — u(x*) ~ ef(x*), 
we have 
Ofu*(x*) — u(x*)] — Ofu*(x*) — u(x*)] ay(x*) a(x) 
: a : . Ox, = Ox; = “Oxy? x8) 


Moreover, it is easily verified that 


Ofu(x*) — ux)] 8S OUX) gy 8 > eulx) 
ax, ax, > aC Sa a Pa ara 


x(x) (89) 


k=1 


and 


n 


(7 - slurs) ~ (55 - Zu) ~ -2 > 3 GA) 90) 


\Ox¥ Ox, Ox* Ox; i ax, Oox¥ 


18 In expressions like @p,/@x,, u is regarded as a function, i.e., the value of wis not held 
fixed, as might be inferred from the somewhat ambiguous notation for partial derivatives. 
Actually, @p,,/éx, means 


~ «Lx, u(x), Vu(x)]. 
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Adding equations (88), (89) and (90), we obtain 


_ OU*(x*) u(x) ay e7u 
Au;, = ax¥ x, ~ (52 + >, dx, Ox, 2%): 


Finally, recalling that 
Au,, ~ 8uz,, u = et, BX, = Dx, 


we can write (91) as 
8u,, = (Su), + > U2, 9 
k=1 


37.4. Calculation of 5/. We are now in a position to calculate the 
tion of a functional defined on a variable domain. 


THEOREM |. The variation of the functional 
J[u] = i F(x, u, Vu) dx 
R 


corresponding to the transformation)® 


x*¥ = Ox, u, Vuse) ~ x, + e9,(x, u, Vu), 
u* = V(x, u, Vu;ce) ~ u + ef(x, u, Vu) 


(i = 1,..., 2”) ts given by the formula 
sJ = e| (F. = >a Bx F.,,)d dx + ef De = (Fd + Fo) dx, 


where 


uy =y- > Uz,9i- 
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(91) 


(92) 


varia- 


(93) 


(94) 


(95) 


Proof. Here, 5J means the principal linear part (relative to ¢) of 


the increment 
AJ = J{u*(x*)] — J[u(x)], 


where u*(x*) is the image of u(x) under the transformation (94). 
definition, (96) equals 
AJ = |, FO, ut, Veut) dt — [ F(x, u, Vu) dx 
YR d 


= [Fee u*, Vous) en SE) FOR w vu)| he 


where 
O(xt, ..., x#) 
O(X1, .- 5 Xn) 


12 As usual, the symbol ~ denotes equality except for terms of order higher 
relative to e. 


(96) 
By 


(97) 


than 1 
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is the Jacobian of the transformation from the variables x,,..., x, to 
the variables x¥,...,x*¥. According to (86), this Jacobian is 


Op. Ope OP n 
Pate Ox, : Ox, Ox, 
O93 O92 29n 
€ aXe l+e ax, € ax» 
093 Ope Pn 
: OXn : OXn i OXn 

a9; O9n > Op; 

~ (1+ 32) (1423) ~ 140d 


and hence we can write (97) as 
As ~ | [Fes ut, veu(1 +e 2) a hGen w)| dx. (98) 
R iS Ox, 


Using Taylor’s theorem to expand the integrand of (98), and retaining 
only terms of order | relative to «, we find that 


ay = f > Fy 8x, + Fydu + > Fy, 84s, + oF > =| dx. (99) 
RL i=} {=1 i Ox 
Then, since 5x, = eq,, substitution of (80) and (92) into (99) gives 


SJ = i) [> F,, 3x, + F,3u + Fy >) uz, 8X, + > F,,, (8u);, (100) 
R Lisi i=1 1=1 


+> Fry, Mayr, 8X_ + F > (x). dx. 
t,k=1 f=1 


As in the case of a fixed domain R, we try to represent the integrand 
of (100) as an expression of the form?° 
G(x) 8u + div (---) 
(cf. p. 153). This can be achieved by noting that 


Mae 


~ (F 8x, = > F,, 8x, + > F(8x;)z, 
x {=1 i=l 


{ 


1 


~ 
i] 
» 


n n 
+ > Fyu,, 8X; + > Fy, Mar, dx, 
A= 


and 


a 


che A ee RD 
kG. = Seo oS (<< F..,) Bu 
1 al “rs 25% ( oh ") 2 Ox, “7! ” 


1 i=1 


20 Then, because of the n-dimensional version of Green’s theorem [see formula (5)], 
the second term of (101) can be transformed into a surface integral. 
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(The last formula resembles an integration by parts.) Thus, finally, 
we have 


n e = n ra) ee 
a= [ (FA - De F,,) Bude + f >, ay Fn, BU + F Bx) de, (101) 


which is the same as formula (95), since 3u =e}, 8x, = €9,. This 
proves the theorem. 


Remark I, In the special case where the function uw and its derivatives 
are varied, but not the independent variables x,, we have 


@ = 0, P=¥— D wae = ts 
and (95) becomes 
ame (F. -> 4 Fu) 90) dx tel s x [Foo] dx, 


which is identical with formula (4) of Sec. 35. 


Remark 2. The formula for the variation of the functional J[u] is ordinarily 
used in the case where u = u(x) is an extremal surface of J[u], i.e., satisfies 
the Euler equation 


eg 
F,- 2 Fx, Fass = 0: 


{i=1 


Then (95) reduces to 
a?) 
ae] D, Bx, Find + Fed dx 
in the general case, and to 
6 
J =c I > Ox, (Fi,,) dx 
isl 


in the case where the independent variables x, are not varied. 


Remark 3. Consider the functional 


Ou, Um 
Tiss «oon tad = fi F(x, Hhececoiscits, Fah Gi) dx, (102) 
involving m unknown functions u,,..., u, and their derivatives 
ou; oo a 
De G@=1,...,2;7 = 1,..., m). (103) 


Introducing the vector u = (u,,...,uU,) and interpreting Vu as the tensor 
with components (103), we can still write (102) in the form 


J(u] = ft. F(x, u, Vu) dx. 
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Then, if (94) is replaced by the transformation 


x¥ = O(x, u, Vu; ©) ~ x, + p(x, u, Vu) (i = 1,..., 7), (104) 
u¥ = V(x, u, Vuse) ~ u; + eb(x,u, Vu) (j= 1,...,m), 
the formula (95) generalizes to 
uu 2 0 OF 
Brae [> ‘ ~ Die 2) ; dx 
er ee (105) 
+e = b, + Fo;\ dx, 
R pa Ox; 2 (=) Ys : 
Ox, 


where 


Remark 4, Let (104) be replaced by the more general transformation 


x* = O(x, u, Vu; e) ~ x + > e,("(x, u, Vu) (i =1,...,n), 
kat 

uk = ¥(x, u, Vuze) ~ uj; + » e,(x, u, Vu) Vi=1,...,m), 
k=1 


depending on r parameters ¢,...,¢,, where « means the vector (c€,..., €,) 
and the symbol ~ denotes equality except for quantities of order higher than 
1 relative to ¢,,...,¢,. Then, formula (105) generalizes further to 


= . 5 ce OF ARG) 
BJ = Dex, ps Fy, Fem fay” dx 
Ou; 
Yr n ts) m OF TK) 
— Fo®\ d 
ai 2, «ef, Ox; 2 (=) ¥ ss 4 ‘ 
Ox; 


where 


n 


Ip = yp — > Beem kab 

37.5. Noether’s theorem. Using formula (95) for the variation of a 
functional, we can deduce an important theorem due to Noether, concerning 
‘invariant variational problems.” This theorem has already been proved 
in Sec. 20 for the case of a single independent variable. Suppose we have a 
functional 


J[u] = a F(x, u, Vu) dx (106) 
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and a transformation 


xt a O((x, u, Vu), 


u* = V(x, u, Vu) uep 


(i = 1, ..., m) carrying the surface o with equation uv = u(x) into the surface 
o* with equation u* = u*(x*), in the way described on p. 169. 
DEFINITION.?2! The functional (106) is said to be invariant under the 
transformation (107) if J[o*] = J[o], f-e., if 
I «FO, ut, Vtu*) dx = J F(x, u, Vu) dx. 
R R 


Example. The functional 


va {1 [Qf + (3) ]e0 


is invariant under the rotation 


x* = xcose — ysine, 
y* = xsine + ycose, (108) 
u* =u, 


where « is an arbitrary constant. In fact, since the inverse of the trans- 
formation (108) is 


x = x* cose + y* sine, 
y = —x* sine + y* cose, 
u=u*, 


it follows that, given a surface o with equation u = u(x, y), the “transformed” 
surface o* has the equation 


u* = u(x* cose + y* sine, —x* sine + y* cose) = u*(x*, y*). 


Consequently, we have 


rr fe + ) Jere 
= Tee (= cose — re # sin c) + (= sine + 0s :) | dx* dy* 


~ SILC) + (GY) acer ae — 10 as) + (5) Jere 


THEOREM 2 (Noether). If the functional 


J[u) = i F(x, u, Vu) dx (109) 


21 Cf, the analogous definition on p. 80 and the subsequent examples. 
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is invariant under the family of transformations 


xf = O,(x, u, Vu; ) ~ x, + eg(x, u, Vu), (110) 
u® = V(x, u, Vu; 2) ~ u + eb(x, u, Vu) 
(¢ = 1, ...,7) for an arbitrary region R, then 
rs) 
> ae Fad + Fo) = 0 (111) 


on each extremal surface of J[u], where 
p = y — >: UzQi- 
t=1 
Proof. According to formula (95), 


90 
sel, > ax, Fanih + Feo dx, 
if uw = u(x) is an extremal surface. Since J[u] is invariant under (110), 
dJ = 0, and since R is arbitrary, this implies (111), as asserted. 


Remark 1. If we drop the requirement that uw = u(x) be an extremal 
surface of J[u], then, using (95) again, we find that (111) is replaced by 


x @ 7 ~~ O 
(F. - 3 xe Fu Jb + D Fad + Fe) = 0. 


Remark 2. If there are m unknown functions u,,..., Um, we introduce 
the vector u = (w4,...,uU,) and continue to write (109), as in Remark 3, 
p. 175. Then invariance of J[u] under the family of transformations 


xF = Ox, u, Vuze) ~ x, + e9,(x, u, Vu) (= 1,...,”), 
u*¥ = V(x, u, Vu;e) ~ u,; + eb,(x, u, Vu) G=1,...,m) 


implies that 


~ Of OF 
p> ax, (2 (2) 


}; + Fo) = 0, (112) 
ex, 


where 


or 


S Fash + (F 2y WF} = const (113) 
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along each extremal. This is precisely the version of Noether’s theorem 
proved in Sec. 20. In other words, the left-hand side of (113) is a first 
integral of the system of Euler equations 


— F,=0 VG = 1,...,m). 


Remark 3. Invariance of the functional (109) under the r-parameter family 
of transformations (see Remark 4, p. 176) 


xf = Ox, u, Vus ec) ~ x + > f(x, u, Vu) (= 1,...,n), 
k=1 

uf = V(x, u, Vu; ce) ~ u; + > ex, u, Vu) VG = 1,...,;m) 
k=1 


implies the existence of r linearly independent relations 


2 (2 eee k=1,..49, (114) 


where 


ye = ye oe > ou, (k) 
j J G ax, Pi . 

Remark 4. Suppose the functional J[u] is invariant under a family of 
transformations depending on r arbitrary functions instead of r arbitrary 
parameters. Then, according to another theorem of Noether (which will 
not be proved here), there are r identities connecting the left-hand sides of 
the Euler equations corresponding to J[u]. For example, consider the 
simplest variational problem in parametric form, involving a functional 


Ixy] = f° Oy % Dade (115) 


where ® is a positive-homogeneous function of degree | in x(t) and y(t) 
(see Sec. 10). Then, as already noted on p. 39, J[x, y] does not change if 
we introduce a new parameter + by setting ¢ = #(t), where dt/dt > 0, and 
in fact, the left-hand sides of the Euler equations 


d d 
®, — 7: = 0, ®, — 7 Pv = 0 
corresponding to (115) are connected by the identity 


: d : d a 
(0, = ar ®,) + (0 - a ®,) = 0. 
Another interesting example of a family of transformations depending 


on an arbitrary function, i.e., the gauge transformations of electrodynamics, 
will be given in Sec. 39. 
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38. Applications to Field Theory 


38.1. The principle of stationary action for fields. In Sec. 36, we discussed 
the application of the principle of stationary action to vibrating systems with 
infinitely many degrees of freedom. These systems were characterized by 
a function u(x, t) or u(x, y, t) giving the transverse displacement of the system 
from its equilibrium position. More generally, consider a physical system 
(not necessarily mechanical) characterized by one function 


u(t, X1,..-, Xp) (116) 
or by a set of functions 


u,(t, X1,...5 Xn) (Vj=1,..., m), 
depending on the time ¢ and the space coordinates x,,..., X,.?2. Such a 
system is called a field [not to be confused with the concept of a field (of 
directions) treated in Chap. 6], and the functions u, are called the field 
functions. As usual, we can simplify the notation by interpreting (116) as 
a vector function uv = (u,,...,4u,) in the case where m> 1. It is also 
convenient to write 


t= Xo, X = (Xo, X1,.--9 Xn), Gx = Axo dx, --- dXy. 


Then the field function (116) becomes simply u(x). 
In the case of the simple vibrating systems studied in Sec. 36, the equations 
of motion for the system were derived by first calculating the action functional 


iy (T — U) dt, 


where 7 is the kinetic energy and U the potential energy of the system, and 
then invoking the principle of stationary action. Similarly, many other 
physical fields can be derived from a suitably defined action functional. 
By analogy with the vibrating string and the vibrating membrane, we write 
the action in the form?* 


Tha = i dxo [+++ [ Lu Vu) dry +++ dx, =f 2, Vu) dx, (117) 


22 We deliberately write the argument ¢ first, since it will soon be denoted by xo. 
In physical problems, ” can only take the values 1, 2 or 3. However, the choice of m 
is not restricted, corresponding to the possibility of scalar fields, vector fields, tensor 
fields, etc. 

23 The aptness of this way of writing the action will be apparent from the examples. 
In the treatment of vibrating systems given in Sec. 36, we did not explicitly introduce 
the functions L = T— Uand #%. Of course, in some cases, e.g., the vibrating plate, 
& must involve higher-order derivatives. 
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Qo. @ é 
R is some n-dimensional region, and Q) is the “cylindrical space-time region” 
R x [a, 5], i.e., the Cartesian product of R and the interval [a, 5] (see footnote 
10, p. 164). The functions L(u, Vu) and #(u, Vu) are called the Lagrangian 
and Lagrangian density of the field, respectively. Applying the principle of 


stationary action to (117), we require that 8/ = 0. This leads to the Euler 
equations 


where V is the operator 


Blk 


=0 (j= 1,...,2), (118) 


=> 2 
SO; a 
(5) 
which are the desired field equations. 

Example 1. For the vibrating string with free ends (x, = x. = 0), we 
have m = n = 1, and 

L = 4 put — cuz) = deur, — tu?) 

[cf. formula (16)]. 

Example 2. For the vibrating membrane with a free boundary [x(s) = 0] 
we have m = 1, n = 2, and 

L = S[eu? — r(uz + us)] = 4[owl, — tui, + u2,)] 

[cf. formula (42)]. 

Example 3. Consider the Klein-Gordon equation 

(CO — M*)u(x) = 0, (119) 

describing the scalar field corresponding to uncharged particles of mass M 
with spin zero (e.g., ~°-mesons). Here, (]denotes the D’ Alembertian (operator) 


a2 gt gt gt 
UF? oat oat oe 


3 


It is easy to see that (119) is the Euler equation corresponding to the Lagran- 
gian density 
LH = 42, — u?, — v2, — u2, — Mu). (120) 


38.2. Conservation laws for fields. Noether’s theorem (derived in Sec. 
37.5) affords a general method of deriving conservation laws for fields, i.e., 
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for constructing combinations of field functions, called field invariants, 
which do not change in time. Thus, suppose the integral 


i L(u, Vu) dx 
a 
is invariant under an r-parameter family of transformations?* 


x¥ = O(x, u, Vuze) ~ x. + > el” (i = 0, 1, 2, 3), 
k=1 


(121) 
uk = ox, u, Vuze) ~ u; + > =f (Vj = 1,...,m), 
k=1 
where « = (€;,...,¢,).. Then, according to Remark 3, p. 179, we have r 
relations of the form 
3 (k) 
div [® = an” oe 0, 
i=0 O% 
where 
I = oe p+ Lol (k=1,...,n (122) 
a5) 
Ox, 
and 


n 


Ou 
(kh) — fk) _ ee nt) 
Ys $ 2 Ox, Pi’: 
These equations have the following interesting consequence: Suppose the 
cylinder Q = R x [a, b], where R is the three-dimensional sphere defined by 


x? + x2 + x2 < c?. 


Let I’ be the boundary of Q, and let v be the unit outward normal to I. 
Then, integrating each of the relations (122) over I’ and using Green’s 
theorem [formula (5) of Sec. 35], we obtain 


J div Idx = { U®,v)de=0 (k=1,...,0). (123) 
Q r 


The surface integral in (123) is the sum of an integral over the lateral surface 
of the cylinder I and an integral over the two end surfaces cut off by the 
planes x9 = a, X) = 6. As c-—> ©, the integral over the lateral surfaces 
goes to zero (by the usual argument requiring that the field fall off at infinity 
“‘sufficiently rapidly’’), and we are left with the integral over the end surfaces. 


24 From now on, we set 1 = 3. 
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On these surfaces, the scalar product (J, v) reduces to /§, where the plus 
sign refers to the “top” surface and the minus sign to the “‘bottom”’ surface. 
Therefore, taking the limit as c > © in (123), we find that 


[i (a, X1, Xo, X3) dx, dX2 dx, 
(124) 


= i) TS (B, x4, X2, X) AX, dxy dx, (kK = 1,...,N) 


where J‘ denotes the xo-component of the vector 7“, and the integrations 
extend over all of three-dimensional space, as will always be assumed if no 
region of integration is indicated. Since a and Bb are arbitrary, it follows 
from (124) that the quantities 


i I dx, dXx_ dx, 


= I( SE Ww + ies (k=1,...,r) (125) 
Male) 


are independent of time. The r quantities (125) are the required field invari- 
ants, whose existence is implied by the invariance of the action functional 
under the r-parameter family of transformations (121). 


Remark. Ofcourse, all the functions in (125) are supposed to be evaluated 
on an extremal surface of the action functional, corresponding to a solution 
u(x) of the field equations (118). 


38.3. Conservation of energy and momentum. The action functional of 
any physical field is invariant under parallel displacements, i.e., under the 
family of transformations 


x x + ey (i = 0, 1, 2, 3), 


ett G= cm), (126) 


where the ¢, are arbitrary. In this case, we have 


8x; = &, du; = 0, 


which implies 


where §,, is the Kronecker delta. According to (125), the corresponding 
field invariants are 


I OO sa dx, dx,dx,  (k =0,1, 2, 3). 


jai (32) OX 
OXo 
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It is convenient to introduce the second-rank tensor 


" af ou, 
Tx = oS (8) — £8:,, (127) 
al 4 
Ox, 


called the energy-momentum tensor. In terms of 7,,, the field invariants are 


P, = | Tox dx, dx, dx,  (k = 0,1,2,3). 


The vector 
P = (Po, P1, Pe, Ps) 


is called the energy-momentum vector, and in fact, it can be shown that Pp is 
the energy and P,, P., P; the momentum components of the field. Thus, 
since P is a field invariant, we have just proved that the energy and momen- 
tum of the field are conserved. 


38.4. Conservation of angular momentum. According to the special 
theory of relativity, the action functional of any physical field is invariant 
under orthochronous Lorentz transformations, i.e., under transformations of 
four-dimensional space-time which leave the quadratic form 


—x2 4+ x2 + x$ + x3 


invariant and preserve the time direction.2° For simplicity, we consider 
the case where u(x) is a scalar field (m = 1). Then the action functional 
must be invariant under the family of (infinitesimal) transformations 


xF~ x, + LueuX, 
i i 2 USAT (128) 
u*¥ = u, 


where 


800 = —1, 811 = 822 = 833 = 1 
and 


Ext = —Ete (k #1) (129) 


are the parameters determining the given transformation.?© Since the 
twelve parameters «,, (k # /) are connected by the relations (129), only six 
of them are independent, and we choose the independent parameters to be 
those for which k < 1. 


25 The determinant of the matrix corresponding to a Lorentz transformation equals 
+1, where the plus sign corresponds to the so-called proper Lorentz transformations. 
See e.g., V. I. Smirnov, Linear Algebra and Group Theory, translated by R. A. Silverman, 
McGraw-Hill Book Co., Inc., New York (1961), Chap. 7. 

26 The parameters €12, €:3, €23 are angles of rotation, while €o1, €o2, €93 are certain 
expressions involving the velocity of light and the velocity of one physical reference 
frame with respect to the other. 
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Corresponding to the transformations (128), we have 


3 
8x, = > Sukix, = De 2; SuFir SueX 
=> > sued KX + S > Suticr SixX, 


i<k ae k>t k=0 
Po > CxS Sux — Bre SuXs)s 
i<k 


where 8;, is the Kronecker delta, and 


se 


It follows that 
it ” = gu 8X1 — Bix SuXtes 


t=0 


. 3 au Ou Ou 
pers > ax, (Sie SiXe — Bu SinX1) = ax, BunXk — 8x, Bur, 
where the pair of indices k, / plays the same role as the single index k in (121) 
and ranges over the six combinations 
0,1; 0,2; 0,3; 1,2; 1,3; 2,3. 


According to (125), the corresponding field invariants are 


ia (Pe 
(2) Ox, SXk Ox, 8m 


0 
ss (130) 
+ Lagu diexi — Six na] dx, dx2 dx, (k < I). 
It is convenient to introduce the third-rank tensor 
of [ou Gu 
Mi = ae Fe SxkXk — Fe 8% + PH an Bix, — Bri Bux] (k < 2D, 
Ox, 
Mia = — Min (k > J), (131) 


called the angular momentum tensor. By definition, M;,, is antisymmetric 
in the indices k and /. Using the expression (127) for the energy-momentum 
tensor (specialized to the case of scalar fields), we can write (131) as 


Most = SiccXuTn — 8uXiTix- 
In terms of M;,,., the field invariants are 


| Mow dx, dxz dx3 (k < J, 


a fact summarized by saying that the angular momentum of the field is 
conserved. 
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Example. Using the quantities g,,, we can write the Lagrangian density 
(120) corresponding to the Klein-Gordon equation in the form 


1 cay Denar 
o=— 32 8dq) ~ 3M 
This leads to the energy-momentum tensor 


9, a ou 
8 Bx Ox, 


and the angular momentum tensor 


Ti - £3, (132) 


Ou Ou Cu 
Mie = 8 ax, (sux: = ax, — BinXr Ox zm) + L(gux, 84. — SreX 81): 


The energy density sada to (132) is 


= M2u?, 
Too 2 5D (se) + 5 Mu 


while the momentum density has the components 


Ou ou 


To = Bx, OXq 


(k = 1,2, 3). 


38.5. The electromagnetic field. To illustrate the methods developed 
above, we now derive the equations of the electromagnetic field from a 
suitable Lagrangian density. The electromagnetic field is described by two 
three-dimensional vectors, the electric field vector E = (E, E2, E3) and the 
magnetic field vector H = (H,, Hz, H3). In the absence of electric charges, 
Eand H are related by the familiar Maxwell equations 


0H OE 
curl E = as curl H = ax,’ (133) 
div H = 0, div E = 0, 


where 
OE, | 2E, | 0Ey 
Ox, OX2 ax3. 
_ (2s _ O82 OE, _ Oy 2Ez 2b) 
Cu eSS (= =e Ox ox One axe) 


and similarly for div H, curl H. It is convenient to express E and H in 
terms of a four-dimensional electromagnetic potential {Aj} = (Ap, A1, A2, A3),7" 
by setting 


E = grad Ay — 4, H = curl A, (134) 
Q 


27 Since the symbol A is reserved for the three-dimensional vector (Ai, Ag, Aa), we 
denote the four-dimensional vector (Ao, A1, Az, Aa) by {4s}. A is sometimes called the 
vector potential and Ao the scalar potential. 


SEC. 38 
where 
A= (Ai, Ag, As) 
and 
(84, 8Ay 2Ao 
grad Ay = (5 Ox,” Oxg 


The potential {A4,} is not uniquely determin 
In fact, E and H do not change if we make a 
we replace {4,} by a new potential {Aj} with co 


A(x) = Afx) + ZY aa) 2 


where xX = (Xo, X1, Xg, X3) and f(x) is an arbitr: 
lack of uniqueness, 


condition usually chosen is 


0A 


eto 
— i + div 4 = D0 


and is known as the Lorentz condition. 
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an extra condition can be imposed on {4A;}. 


187 


‘| 


ed by the vectors E and H. 
gauge transformation, i.e., if 
mponents 


G = 0, 1, 2, 3), 


To avoid this 
The 


ary function. 


OA, 


= 0, 
x; 


(135) 


Next, we prove that the Maxwell equations (133) reduce to a single equa- 


tion determining the electromagnetic potential {A,}. 


antisymmetric tensor H\,, whose matrix 


€ <b =k > 
E, 0 H; 
E, -H, 0 
E, H, —, 


is formed from the components of E and H. 
formula relating H,; to the potential {A,} is 
0A; _ 
Ox, 


Ai; = 


In terms of the tensor H,,, we can write the M 
form 


a oH, 

si BT = 0 ( j= 

> ‘ax J 
OH, Of, | OA, 

OX Ox; Ox, 

where in (138), 

0, 1, 2, 
in 1, 2, 3, 
bAK= No 30 
3, 0, 1. 


aA, 
Ox, 


First, we introduce the 


E3 


—H, 


Ay 
0 


It is easily verified that the 

(136) 
axwell equations (133) in the 
0, l, 2, 3), 


(137) 


=0, (138) 
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Substituting (136) into (137) and (138), and using the Lorentz condition (135), 
we find that (138) is an identity, while (137) reduces to 


OA; = 0 G = 0, 1, 2, 3), (139) 
where [] is the D’Alembertian 


o2 oe? o2 o2 


O= ~ oat ad t ad + og 


Finally, we show that (139) is a consequence of the principle of stationary 
action,?® if we choose the Lagrangian density of the electromagnetic field 
to be 

= 1 2 2 
Lf = a (E H?), (140) 


Replacing E and H in (140) by their expressions (134) in terms of the electro- 
magnetic potential {A,}, we obtain 


TT 


1 0A \? 
% = ge |(erad 4, — 54)" — (curt ay]. (141) 


We shall only verify that the Euler equations 


OL 3 af ; 
aA; te (2) (j = 0, 1, 2, 3) (142) 
Ox; 


corresponding to (141) can be reduced to the form (139) for the component 
Ao, since the calculations for A,, A2,A3 are completely analogous. It 
follows from (141) that 


a a 0 
cy Pa () ee 
a{ <8 
OXo 
HE ocak (= - 7) 
(34) ~ 4 \@x, Axo)’ 
a{ 222 
Ox, 


OXe 

gees 1 (Ase 

(=) 4m \0xg 0A, 
a{ 22 

OXg 


28 Provided A satisfies the Lorentz condition. 
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Thus, for j = 0, (142) becomes 


On oe 
ae 3 4s 
Ox; 

_ 1 [8A , Ay , 8A 8 (8A, , GA , OAg\] _ 

— = an oe * oe * oe - (ZO + 2 + 4] = 0. 


(143) 
According to the Lorentz condition (135), 


GA, , 24, , 8g _ Ody 
Oxy OXe OX3 is Oxo. 
and hence (143) reduces to 
_ 8 Ag , GAq , GA i 07 Ag 
ox2 Ox? ox? ox? 
which is just (139), for j = 0. 


Remark 1. In deriving (139) from (141), we made use of the Lorentz 
condition (135). Instead, we could have introduced an additional term 
into the Lagrangian density by writing 


= O14, = 0, 


1 aA\? eee 0Ao\? 
Li= = {(erad Ag — a — (curl 4)? — (div 4 - od }. (144) 


which reduces to (141) if the Lorentz condition is satisfied. The Euler 
equations corresponding to (144) reduce to (139) for arbitrary {A,}. 


Remark 2. The Lagrangian density of the electromagnetic field, and hence 
its action functional, is invariant under parallel displacements, Lorentz 
transformations and gauge transformations. According to Sec. 38.3, the 
invariance under parallel displacements implies conservation of energy and 
momentum of the field, while, according to Sec. 38.4, the invariance under 
Lorentz transformations implies conservation of angular momentum of the 
field. Moreover, according to Remark 4, p. 179, the invariance under gauge 
transformations (which depend on one arbitrary function) implies the exis- 
tence of a relation between the left-hand sides of the corresponding Euler 
equations (139). Therefore, these equations do not uniquely determine 
the electromagnetic potential {A,}. In fact, to determine {A,} uniquely, 
we need an extra equation, which is usually chosen to be the Lorentz condition 
(135).?9 


2° The Maxwell equations are actually invariant under a 15-parameter family (group) 
of transformations. In addition to the 10 conservation laws already mentioned (energy, 
momentum and angular momentum), this invariance leads to 5 more conservation laws, 
which, however, do not have direct physical meaning. For a detailed treatment of this 
problem, see E. Bessel-Hagen, Uber die Erhaltungssdtze der Elektrodynamik, Math. Ann., 
84, 258 (1921). 
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PROBLEMS 


1, Find the Euler equation of the functional 


Ta} = fee. [> ud des... dem 
Riz. 
2. Find the Euler equation of the functional 
J{u) = fife V1 + u2 + u2 + u2 dx dy dz. 


3. Write the appropriate generalization of the Euler equation for the 
functional 


J(u) = J I F(x, Y, Uy Uz, Uy, Uz2, Ury, Uyy) Ax dy. 


4. Starting from Green’s theorem 


[f. (2-F) acay =f wax + oa, 
prove that 


Seasee = [fvanaxe + [ (est 52) ay, 

JfeBherer = jf vStacay— | (oh 42) a 

Iheathare = [eB her- 3 (Se ~ #35) & 
a I. (es a ) dy 


5. Let J[u] be the functional 
ty 
i [ [—(uer + yy)? + 20 — w)\Uecttyy — u2,)) dx dy dt, 


Using the result of the preceding problem, prove that if we go from u tou + ed, 
then 


By = ah ff (—V4uyh dx dy dt + eff [Pes + Mw) 3] ds dt, 
where M(u) and P(u) are given by formulas (61) and (62). 


Hint. Express 0/dx, 0Y/dy in terms of 0/8n, Op/ds, and use integration 
by parts to get rid of Op/ds. 


6. Show that when a = 1, formula (105) of Sec. 37.4 reduces to formula (7) 
of Sec. 13. 


7. Given the functional 
ou 
Fie / | dx dy, 


compute J[o*] if o* is obtained from o by the transformation (108). 
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8. Derive the Euler equations corresponding to the Lagrangian density 
3 3 3 3 
ou 2 2,,2 0A; 2 2 
L = 2 (5a at eA.) + M2u? + > 2 Se (=) i My «42, 
where the field variables are u, Ap, Ai, Ao, Az, and the factor e; equals 1 
if i = Oand —1 if 7 = 1, 2,3. 


9. Show that the Lagrangian density Y of the preceding problem is Lorentz- 
invariant if u transforms like a scalar and if Ap, A,, Ao, Ag transform like the 
components of a vector under Lorentz transformations. Use this fact to 
derive various conservation laws for the field described by Y. 


8 


DIRECT METHODS 
IN THE 
CALCULUS OF VARIATIONS 


So far, the basic approach used to solve a given variational problem 
(and indeed, to prove the existence of a solution) has been to reduce the prob- 
lem to one involving a differential equation (or perhaps a system of differen- 
tial equations). However, this approach is not always effective, and is 
greatly complicated by the fact that what is needed to solve a given varia- 
tional problem is not a solution of the corresponding differential equation 
in a small neighborhood of some point (as is usually the case in the theory of 
differential equations), but rather a solution in some fixed region R, which 
satisfies prescribed boundary conditions on the boundary of R. The 
difficulties inherent in this approach (especially when several independent 
variables are involved, so that the differential equation is a partial differential 
equation) have led to a search for variational methods of a different kind, 
known as direct methods, which do not entail the reduction of variational 
problems to problems involving differential equations. 

Once they have been developed, direct. variational methods can be used to 
solve differential equations, and this technique, the inverse of the one we 
have used until now, plays an important role in the modern theory of the 
subject. The basic idea is the following: Suppose it can be shown that a 
given differential equation is the Euler equation of some functional, and 
suppose it has been proved somehow that this functional has an extremum 
for a sufficiently smooth admissible function. Then, this very fact proves 
that the differential equation has a solution satisfying the boundary con- 


ditions corresponding to the given variational problem. Moreover, as we 
192 
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shall show below (Sec. 41), variational methods can be used not only to 
prove the existence of a solution of the original differential equation, but also 
to calculate a solution to any desired accuracy. 


39. Minimizing Sequences 


There are many different techniques lumped together under the heading 
of ‘“‘direct methods.” However, the direct methods considered here are all 
based on the same general idea, which goes as follows: 

Consider the problem of finding the minimum of a functional J[y] defined 
on a space 4 of admissible functions y. For the problem to make sense, 
it must be assumed that there are functions in for which J[y] < +0, 
and moreover that? 


inf J[y] =p > —©, (1) 
y 
where the greatest lower bound is taken over all admissible y. Then, by 


the definition of », there exists an infinite sequence of functions {y,} = 
V1, Yo,..., Called a minimizing sequence, such that 


jim J[¥n] = B. 


If the sequence {y,} has a limit function ¥, and if it is legitimate to write 


J{9] = lim JLab Q) 
i.e., 
J{lim y,] = lim J[y,], 
then 


J(¥] =, 
and / is the solution of the variational problem. Moreover, the functions 
of the minimizing sequence {y,} can be regarded as approximate solutions 
of our problem. 


Thus, to solve a given variational problem by the direct method, we must 

1, Construct a minimizing sequence {y,}; 

2. Prove that {y,} has a limit function y; 

3, Prove the legitimacy of taking the limit (2). 

Remark 1. Two direct methods, the Ritz method and the method of finite 
differences, each involving the construction of a minimizing sequence, will 


be discussed in the next section. We reiterate that a minimizing sequence 
can always be constructed if (1) holds. 


1 By inf is meant the greatest lower bound or infimum. 
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Remark 2. Even if a minimizing sequence {y,} exists for a given varia- 
tional problem, it may not have a limit function §. For example, consider 
the functional 


Ji = [xy 
DI= J yds 


where 
W-)=-1l, y=1. (3) 
Obviously, J[y] takes only positive values and 
inf J[y] = 0. 
We can choose ° 
Ya(X) = et W229 (4) 


as the minimizing sequence, since 
f n*x? dx s 1 fi: dx 2 
-1 (tan-? n)°(1 + n?x?)? ~ (tan-?n)?J-1 14+ 7x? ntan— an 


and hence J[y,] > 0asn—>0o. Butasn— o, the sequence (4) has no limit 
in the class of continuous functions satisfying the boundary conditions (3). 

Even if the minimizing sequence {y,} has a limit » in the sense of the 
@-norm (i.e., ¥,—> J aS n—> ©, without any assumptions about the convergence 
of the derivatives of y,), it is still no trivial matter to justify taking the limit 
(2), since in general, the functionals considered in the calculus of variations 
are not continuous in the @-norm. However, (2) still holds if continuity 
of J[y] is replaced by a weaker condition: 


THEOREM. If {y,} is a minimizing sequence of the functional J[y], with 
limit function y, and if J{y] is lower semicontinuous at y,? then 


J{¥] = lim JLyn]- 
Proof. On the one hand, 
J[§] > lim Jly,] = inf Jy} (5) 
while, on the other hand, given any « > 0, 
Jin) — IV] > —e, (6) 
if n is sufficiently large. Letting n— oo in (6), we obtain 
JUG] < lim JD] + 6, 


or 
J¥] < lim Jiyn], (7) 


2 See Remark |, p. 7. 
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since e is arbitrary. Comparing (5) and (7), we find that 
as asserted. 


40. The Ritz Method and the Method of Finite Differences® 


40.1. First, we describe the Ritz method, one of the most widely used direct 
variational methods. Suppose we are looking for the minimum of a func- 
tional J[y] defined on some space -4 of admissible functions, which for 
simplicity we take to be a normed linear space. Let 


P1, Qa,--s (8) 


be an infinite sequence of functions in 4, and let 4, be the n-dimensional 
linear subspace of .@ spanned by the first » of the functions (8), i.e., the set 
of all linear combinations of the form 


191 Pots =p AnPns (9) 


where o,,...,%, are arbitrary real numbers. Then, on each subspace .%,, 
the functional J[y] leads to a function 


J[aipr +--+ + OnPa] (10) 


of the n variables «,,..., ,. 

Next, we choose a,,..., a, in such a way as to minimize (10), denoting 
the minimum by u, and the element of .@, which yields the minimum by y,. 
(In principle, this is a much simpler problem than finding the minimum of the 
functional J[y] itself.) Clearly, u, cannot increase with a, i.e., 


ty 2 Uo 2 fey 


since any linear combination of 9,..., 9, is automatically a linear combi- 
nation 91, ..-, Pn, @n+i1- Correspondingly, each subspace of the sequence 


M,, My... 


is contained in the next. We now give conditions which guarantee that the 
sequence {y,} is a minimizing sequence. 


DEFINITION. The sequence (8) is said to be complete (in ) if given 
any ye M andanye > 0, there is a linear combination 7, of the form (9) 
such that ||y, — y|| < © (where n depends on «). 


3 Here we merely outline these two methods, without worrying about questions of 
convergence, and taking for granted the existence of an exact solution of the given 
variational problem. 
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THEOREM. [Jf the functional J[y] is continuous,* and if the sequence (8) 
is complete, then 
lim py = pu, 
where 
u = infJ[y]. 
y 


Proof. Given any « > 0, let y* be such that 
J0y*) <ute. 


(Such a »* exists for any « > 0, by the definition of u.) Since J[y] is 
continuous, 


lJ{y] - JLy*ll < «, (11) 


provided that ||y — y*|| < 3 = Sc). Let y, be a linear combination of 
the form (9) such that |y, — y*|| < 8. (Such an y, exists for suffi- 
ciently large n, since {p,} is complete.) Moreover, let y, be the linear 
combination of the form (9) for which (10) achieves its minimum. 
Then, using (11), we find that 


u <J[yn] < J [Mn] < w + 2. 
Since e is arbitrary, it follows that 
lim J[y,] = lim pu, =p, 
as asserted, - _ 


Remark 1. The geometric idea of the proof is the following: If {,} is 
complete, then any element in the infinite-dimensional space “4 can be 
approximated arbitrarily closely by an element in the finite-dimensional 
space , (for large enough 1). We can summarize this fact by writing 

lim 4, = 4. 


n-@ 


Let » be the element in 4 for which J[}] = yu, and let p, €.#, be a sequence 
of functions converging to . Then {¥,} is a minimizing sequence, since 
J[y] is continuous. Although this minimizing sequence cannot be con- 
structed without prior knowledge of ~, we can show that our explicitly 
constructed sequence {y,} takes values J[y,] arbitrarily close to J[,], and 
hence is itself a minimizing sequence. 


Remark 2. The speed of convergence of the Ritz method for a given 
variational problem obviously depends both on the problem itself and on 


4 ]T.e., continuous in the norm of .4. For example, functionals of the form 


b 
Jb] = |” Fe, yy dx 


are continuous in the norm of the space 7,(a, db). 


SEC. 40 DIRECT METHODS IN THE CALCULUS OF VARIATIONS 197 


the choice of the functions 9,. However, it should be pointed out that 
in many cases, linear combinations involving only a very small number of 
functions 9, are enough to give a quite satisfactory approximation to the 
exact solution. 


Remark 3. More generally, the spaces 4 and .4, need not be normed 
linear spaces themselves, but only suitable sets of admissible functions 
belonging to an underlying normed linear space # (see Remark 3, p. 8). 
For example, the admissible functions may satisfy boundary conditions like 


ya)= A, YWb)=B 
(see Sec. 40.2), or a subsidiary condition like 


iE y?(x) dx = 1 


(see Sec. 41). This case can be handled by appropriate modifications of 
the present method. 


40.2. We now describe another method involving a sequence of finite- 
dimensional approximations to the space .4. This is the method of finite 
differences, which has already been encountered in Sec. 7. There, in con- 
nection with the derivation of Euler’s equation, we noted that the problem 
of finding an extremum of the functional® 


Jol=[ Fesyy)dx Ha =A, 96) = B, (12) 


can be approximated by the problem of finding an extremum of a function 
of variables, obtained as follows: We divide the interval [a, 6] inton + 1 
equal subintervals by introducing the points 


Xo = GA, X1y-+-5 Xn, Xn+1 = 5, X41 — X = Ax, 
and we replace the function y(x) by the polygonal line with vertices 


(Xo; Yo), (x1, yi) sey (Xn Yn) (Xn+2 Yn+r)s 


where now y, = y(x;). Then (12) can be approximated by the sum 
—~SFlx, y, Y= 
I(V1y ~~ +) Yn) = > F [xu % Ax ]o (13) 


which is a function of variables. (Recall that y. = A and y,,, = Bare 
fixed.) If for each n, we find the polygonal line minimizing (13), we obtain 
a sequence of approximate solutions to the original variational problem. 


5 Here, -@ will be a linear space only if A = B = 0 (cf. Remark 3). 
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41. The Sturm-Liouville Problem 


In this section, we illustrate the application of direct variational methods 
to differential equations (cf. the remarks on p. 192), by studying the follow- 
ing boundary value problem, known as the Sturm-Liouville problem: Let 
P = P(x) > Oand Q = Q(x) be two given functions, where Q is continuous 
and P is continuously differentiable, and consider the differential equation 


—(Py'y’ + Oy = ry (14) 
(known as the Sturm-Liouville equation), subject to the boundary conditions 
y(a)=0, yb) = 0. (15) 


It is required to find the eigenfunctions and eigenvalues of the given boundary 
value problem, i.e., the nontrivial solutions® of (14), (15) and the correspond- 
ing values of the parameter A. 


THEOREM. The Sturm-Liouville problem (14), (15) has an infinite 
sequence of eigenvalues , ,..., and to each eigenvalue there 
corresponds an eigenfunction y™ which is unique to within a constant 
factor. 


The proof of this theorem will be carried out in stages, and at the same 
time we shall derive a method for approximating the eigenvalues ‘” and 
eigenfunctions y™. 


41.1. We begin by observing that (14) is the Euler equation corresponding 
to the problem of finding an extremum of the quadratic functional 


Ji) =f) @y? + Oy) ds, (16) 


subject to the boundary conditions (15) and the subsidiary condition’ 
bo 
| ydx=1. (17) 


Thus, if y(x) is a solution of this variational problem, it is also a solution 
of the differential equation (14), satisfying the boundary conditions (15). 
Moreover, y(x) is not identically zero, because of the condition (17). 

Next, we apply the Ritz method (see Sec. 40.1) to the functional (16), first 


®In other words, the solutions which are not identically zero. For any value of A, 
(14) and (15) are trivially satisfied by the function y(x) = 0. 
7 Use the theorem on p. 43, changing A to —A. 
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verifying that it is bounded from below, as required [cf. formula (1)]. Since 
P(x) > 0, this fact follows from the inequality 


b b b 
[ @y? + Oy) dx > [ Oyrdx > Mf ydx = M, 
where 
M= min Q(x). 


a<z<b 


For simplicity, we assume that a = 0, b = zm, and we choose {sin nx} as the 
complete sequence of functions {p,(x)} used in the Ritz method. This 
sequence also has the desirable feature of being orthogonal, i.e., 


ie sin kx sin Ix dx = 0 (k #2). 

Qo 

If a linear combination 
>, a sin kx (18) 
k=1 


is to be admissible, it must satisfy the conditions (15) and (17). The condition 
(15) is automatically satisfied by our choice of the functions sin nx, but (17) 
leads to the requirement 


i Say sin kx) dx =5 S a? = 1. (19) 
O\ K=1 k=1 


Moreover, for a linear combination (18), the functional J/[y] reduces to 


Inlay 54) = J [Poo S ate sin kx)” - oe | Sa sin kx) ] dx, 
a “< (20) 


which is a function of the n variables «,,..., a, (in fact, a quadratic form 
in these variables. 

Thus, in terms of the variables «,,...,a,, our problem is to minimize 
J (01, .. +; &,) on the surface o, of the n-dimensional sphere with equation (19). 
Since a, is a compact set and J,(a, ..., &a) is continuous on Gy, J,(a),..., &) 
has a minimum A at some point a§”, ..., 7 of 6,.2 Let 


n 
yP(x) = D> of sin kx 
k=1 


be the linear combination (18) achieving the minimum A‘. _ If this procedure 
is carried out for n = 1, 2,..., we obtain a sequence of numbers 


AP, ABP ee es (21) 
and a corresponding sequence of functions 


YX) Y2(X), --- (22) 


® See e.g., T. M. Apostol, op. cit., Theorem 4-20, p. 73. 
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Noting that o, is the subset of c,,, obtained by setting «,,, = 0, while 


J,(a1, yey On) = Inala, very hry 0), 
we see that 
AP <A, (23) 


since increasing the domain of definition of a function can only decrease its 
minimum. It follows from (23) and the fact that J[y] is bounded from below 
that the limit 

AY = lim AY (24) 


R+@ 


exists. 


41.2. Now that we have proved the convergence of the sequence of 
numbers (21), representing the minima of the functional 


[ ey? + oO) dx 


on the sets of functions of the form 
n 
> a, sin kx 
k=1 


satisfying the condition (19), it is natural to try to prove the convergence 
of the sequence of functions (22) for which these minima are achieved. We 
first prove a weaker result: 


Lemma 1. The sequence {y)(x)} contains a uniformly convergent 
subsequence. 


Proof. For simplicity, we temporarily write y,(x) instead of yQ(x). 
The sequence 


An? = f (Pyx + Qyn) dx 
is convergent and hence bounded, i.e., 
[, Py? + Oy2) de < M 
for all n, where M is some constant. Therefore 
[Py ax < M+ fr Ovi dx| <M+ max |Q(x)| = M, 


a<gr<d 


and since P(x) > 0, 


n M 
#2. 1 a 
I, Ya(x) dx < min P(x) Ma: (25) 
a<i<t 
Using (25), the condition 
yn(O) ia 0, 


and Schwarz’s inequality, we find that 


z 2 °z = 
Inco? =| f° x@ az)’ < [yee de f° dé < Mon, 


z 
0 
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so that {y,(x)} is uniformly bounded.° Moreover, again using Schwarz’s 
inequality, we have 


2 %2, 2 Ig 7 Zz 
| ¥a(%2) — Vax)? = if Ya(x) dx} < i Yn ax: I dx| < M,|x2 — x,|, 
Z1 71 71 
so that{y,(x)} is equicontinuous.!° Thus, according to Arzela’s theorem,?! 
we can select a uniformly convergent subsequence {y,,(x)} from the 


sequence {y,(x)} and Lemma | is proved. 


We now set 
yP(x) = lim y,,,(x). (26) 


Our object is to show that y(x) satisfies the Sturm-Liouville equation (14) 
with 4 = 4°. However, we are still not in a position to take the limit as 
m-> oo of the integral 


[ eyz + Oy.) ax, 


since as yet we know nothing about the convergence of the derivatives y,_. 
Therefore, the fact that for each m, the function y,,, minimizes the functional 
J[y] for y in the n,,-dimensional space spanned by the linear combinations 


tm 

> a, sin kx 

k=1 
[subject to the condition (19) with n = n,,} still does not imply that the limit 
function y(x) minimizes J[y] for y in the full space of admissible functions. 
To avoid this difficulty, we argue as follows: 


LEMMA 2. Let y(x) be continuous in [0, 7], and let 


[ [—(Ph’)’ + O,hly dx = 0 (27) 


2 A family of functions Y’ defined on [a, 5] is said to be uniformly bounded if there is 

a constant M such that 
IvQ)| < M 

for all Pe'¥ and alla <x < 5. 

10 A family of functions Y’ defined on [a, 5] is said to be equicontinuous if given any 
e > O, there is a 5 > O such that 

Ib(x2) — Y(x1)| < € 

for all } e ¥’, provided that |x2 — x:| < 8 

11 Arzela’s theorem states that every uniformly bounded and equicontinuous sequence 
of functions contains a uniformly convergent subsequence (converging to a continuous 
limit function). Seee.g., R. Courant and D. Hilbert, op. cit., vol. 1, p. 59. 
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for every function h(x) © G0, x),)* satisfying the boundary conditions 
(0) = A(x) = 0, h'(0) = h(n) = 0. (28) 
Then y(x) also belongs to Z.(0, x), and 
—(Py'y' + Qiy = 0. 
Proof. If we integrate (27) by parts and use (28), we find that 
iN [-(Ph')’ + Qyh]y dx = — ie Ph'y dx — [ Pth'ydx + iA O;hy dx 


--{" [- Py + [ Pryde + | (j. O,y at) dé] dx = 0. 


It follows from Lemma 3, p. 10 that 
= Fa § 

—Py + [ Py dé + f (| Ory at) de =t +x, (29) 
Qo 0 QO 


where cy and c, are constants. Since the right-hand side and the 
second and third terms in the left-hand side of (29) are obviously 
differentiable, (Py)’ exists, and in fact, differentiating (29) term by term, 
we find that 


—(Pyy + Py + | Ouydé = ay. (30) 


Since the function P is continuously differentiable and does not vanish, 
y’ exists and is continuous. Thus, (30) reduces to 


—Py' + | Quydé = a1. G1) 
Since the right-hand side and the second term in the left-hand side of (31) 
are differentiable, it follows that (Py’)’ exists, and in fact 


—(PyyY + Qy =0, 
as asserted. Moreover, by the same argument as before, y” exists and is 
continuous. 


41.3. We can now show that the function y(x) defined by (26), whose 
existence follows from Lemma |, satisfies the Sturm-Liouville equation 


— (Py?) + Oy” = ADYM, (32) 


where A‘” is the limit (24). According to the theory of Lagrange multipliers 
(cf. footnote 7, p. 43), at the point («{,..., a’) where the quadratic form 
(20) achieves its minimum subject to the subsidiary condition (19), we have 


Fe {Ialeas ya Jae (> % sin kx) ‘} dx = 0 (yr =1,...,n). 


12 Le., for every A(x) with continuous first and second derivatives in [0, 7]. 
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This leads to the n equations 
i i {P| > af)(sin kx (sin rx)’ 
0 k=1 
+ [O(x) - xl > af) sin kx] sin rx} dx = 0 (r= 1,...,7). 
k=1 


Multiplying each of the equations (33) by an arbitrary constant C{” and 
summing over r from | to n, we obtain 


[Wey + (Q — 2D)yqhg] dx = 0, (34) 
0 


where 
A,(x) = > Ci sin rx. (35) 


r=1 


An integration by parts transforms (34) into 
[t-@hy’ + (OQ — Maly dx = 0. (36) 


If A(x) is an arbitrary function in D,(0, 7) satisfying the boundary conditions 
(28), we can choose the coefficients C\” in such a way that 


h, => h, h,=>h, hi, > h" 


(see Prob. 8). Here, the symbol = denotes convergence in the mean, i.e., 
h,, > h stands for 


lim f " \g(x) — A(x)|2 dx = 0 
n+@ 0 
Since yP —> y uniformly in [0, x],?° it follows from (36) that 
lim |" [—(Phig)’ + (Q — 2 Ving YD dx 
m+o 40 
= [, [-@ry + (2 - Aly ax = 0 
(see Prob. 9). The fact that y‘? is an element of Y,(0, x) and satisfies the 
Sturm-Liouville equation (32) is now an immediate consequence of Lemma 2, 
with Q, = QO — 2”. 


So far, the function yx) has been defined as the limit of a subsequence 
{y¥2x)} of the original sequence {y{(x)}. We now show that the sequence 


13 We now restore the superscript on y{?. 
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{y(x)} itself converges to yx). To prove this, we use the fact that fora 
given i, the solution of the Sturm-Liouville equation 


—(Py') + Qy = ry (37) 
satisfying the boundary conditions 
0) =0, yn) =0 (38) 
and the normalization condition 
[ *@) dx = 1 (39) 


is unique except for sign. Let y(x) be a solution of (37) corresponding 
to > = X, and suppose y?(xo) # O at some point Xo in [0,7]. Then 
choose the sign so that y(x,) > 0. Similarly, let yx) be a solution 
of (37) corresponding to A = A‘, and choose the signs so that y((x,) > 0 
for all n. If y2(x) does not converge to y(x), we can select another 
subsequence from {y(2(x)} converging to another solution (x) of (37), 
where again A = A. Because of the uniqueness (except for sign) of 
solutions of (37), subject to (38) and (39), this means that 


P(x) = = yx), 
and hence j (xo) < 0, which is impossible, since yx.) > 0 for all nx. 
Therefore, y$'(x) > y?(x) [in fact, uniformly], provided we choose each 
y(x) with the proper sign. 
41.4. We have just proved that the Sturm-Liouville problem has the eigen- 
function y'(x), corresponding to the eigenvalue 4°”. The ‘“‘next” eigen- 


function y(x) and the corresponding eigenvalue X‘” can be found by 
minimizing the quadratic functional 


Jly] = f° (Py? + Oy?) dx (40) 


subject to the same conditions (38) and (39) as before, plus an extra orthog- 
onality condition 


[ yy) ax = 0. (41) 
In fact, substituting 
yx) = > a, sin kx (42) 
k=1 


into (40), we again obtain the quadratic form J,(o,,..., %n) given by (20), 
but this time we study J,(a,,...,a,) on the set of functions of the form (42) 
which not only lie on the n-dimensional sphere a, with equation (19), thereby 
satisfying the normalization condition (39), but are also orthogonal to the 
function 


nr 
yx) = > a sin kx, 
k=1 
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i.e., satisfy the condition 
> a f sin kx (> af? sin tx) dx = FY al? =O 43) 
k=1 0 i=1 k=] 


This is the equation of an (mn — 1)-dimensional hyperplane, passing through 
the origin of coordinates in n dimensions. Its intersection with the sphere 
(19) is an (x — 1)-dimensional sphere c,,_,. By the same argument as before 
(cf. footnote 8), J,(a,...,a,) has a minimum 4 on a,_,. It is not hard 
to see that 

Aha <A? 


[cf. (23)], and hence the limit 


(2) 4 2) 
A = lim AG 


exists, since J[y] is bounded from below. Moreover, it is obvious that 


AY < , (44) 
Now let 


yr = » oi” sin kx 
k=1 
be the linear combination (42) achieving the minimum A, where, of course, 
the point («?,..., a) lies on the sphere é,_,. As before, we can show 
that the sequence {y?(x)} converges uniformly to a limit function y(x) 
which satisfies the Sturm-Liouville equation (37) [with A = A], the boun- 
dary conditions (38), the normalization condition (39), and the orthogonality 
condition (41). In other words, y‘(x) is the eigenfunction of the Sturm- 
Liouville problem corresponding to the eigenvalue 4°”. Since orthogonal 
functions cannot be linearly dependent, and since only one eigenfunction 
corresponds to each eigenvalue (except for a constant factor), we have the 
strict inequality 
ND < A, 


instead of (44). Finally, we note that by repeating the above argument, 
with obvious modifications, we can obtain further eigenvalues X‘”, A, ..., 
and corresponding eigenfunctions (x), y“(x),.... 

For further material on the use of direct methods in the calculus of varia- 
tions, we refer the reader to the abundant literature on the subject.'* 


14 See e.g., N. Krylov, Les méthodes de solution approchée des problémes de la physique 
mathématique, Mémorial des Sciences Mathématiques, fascicule 49, Gauthier-Villars 
et Cie., Paris (1931); S. G. Mikhlin, Tpamsre Metogs! B Matemaruyeckon Du3zuke 
(Direct Methods in Mathematical Physics), Gos. Izd. Tekh.-Teor. Lit., Moscow (1950); 
S. G. Miknlin, BapvaunonHprie Metogbr 8 Matematuyecxon @u3anxe (Variational 
Methods in Mathematical Physics), Gos. Izd. Tekh.-Teor. Lit., Moscow (1957); L. V. 
Kantorovich «nd V. I. Krylov, Approximate Methods of Higher Analysis, translated 
by C. D. Benster, Interscience Publishers, Inc., New York (1958). 
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PROBLEMS 


1. Let the functional J[y] be such that J[y] > —oo for some admissible 
function, and let 
supJ[y] = p< +0, 


where sup denotes the /east upper bound or supremum. By analogy with the 
treatment given in Sec. 39, define a maximizing sequence, and then state and 
prove the corresponding version of the theorem on p. 194. 


2. Use the Ritz method to find an approximate solution of the problem of 
minimizing the functional 


Ji = [G2 - 7 - 2a) dx, (0) = (1) = 0, 


and compare the answer with the exact solution. 
Hint. Choose the sequence {9,(x)} (see p. 195) to be 
x11 — x), x71 —- x), x3(1 — x),... 


3. Use the Ritz method to find an approximate solution of the extremum 
problem associated with the functional 


1 
Jly] = i (x3y"2 + 100xy? — 20xy) dx, = (1) = y'(1) = O. 
Hint. Choose the sequence {9,(x)} to be 
(x — 1)?7, x(x —- 1)?, x(x — 1)%,... 
4. Use the Ritz method to find an approximate solution of the problem of 
minimizing the functional 
2 
JU} = [G7 + y+ 2xy)dx, (0) = y2) = 0, 

and compare the answer with the exact solution. 


5. Use the Ritz method to find an approximate solution of the equation 
yu | Ou 


be tee 


inside the square 
R: -a<x<a, -a<yea, 
where u vanishes on the boundary of R. 


Hint. Study the functional 


ro ff [+ (a ~ 2 ea 


and choose the two-dimensional generalization of the sequence {9¢,(x)} to be 


(x? — a?)(y? — 67), (x? + y?)(x? ~ a?)(y? ~ 67),.... 


PROBLEMS DIRECT METHODS IN THE CALCULUS OF VARIATIONS 207 


6. Write the Sturm-Liouville equation associated with the quadratic functional 
b 
Ji = | ey? + oy) de, 
where c and c, > 0 are constants, subject to the boundary conditions 


ya)=0, yb) = 0. 


Find the corresponding eigenvalues and eigenfunctions. 


7. Formulate a variational problem leading to the Sturm-Liouville equation 
(14) subject to the boundary conditions 


y(a)=0, yb) = 0, 
instead of the boundary conditions (15). 
Hint. Recall the natural boundary conditions (29) of Sec. 6. 
8. Prove that any function A(x) € Z,(0, x) satisfying the boundary conditions 
(28) can be approximated in the mean by a linear combination 


h,(x) = > C™ sin rx, 
r=1 


where at the same time /,(x) approximates A’(x) and An(x) approximates 
h’(x) [in the mean]. Show that the coefficients C{ need not depend on 7 
and can be written simply as C,. 


Hint. Form the Fourier sine series of A(x) and integrate it twice term by 
term. 


9. Show that if f,(x) —> f(x) in the mean and g,(x) — g(x) uniformly in some 
interval [a, 6), then 


PP feedenx) dx -- f° ode de. 


Hint. Use Schwarz’s inequality. 


Appendix I 


PROPAGATION OF DISTURBANCES 
AND THE 
CANONICAL EQUATIONS’ 


In this appendix, we consider the propagation of “‘disturbances” in a 
medium which is regarded as being both inhomogeneous and anisotropic. 
Thus, in general, the velocity of propagation of a disturbance at a given point 
of the medium will depend both on the position of the point and on the 
direction of propagation of the disturbance. We also make the following 
two assumptions about the process under consideration: 


1. Each point can be in only one of two states, excitation or rest, i.e., no 
concept of the intensity of the disturbance is introduced. 


2. If a disturbance arrives at the point P at the time ¢, then starting from 
the time 7, the point P itself serves as a source of further disturbances 
propagating in the medium. 


In the analysis given here, our aim is to show that a study of processes 
of excitation of the kind described, together with purely geometric considera- 
tions, can be used to derive such basic concepts of the calculus of variations 
as the canonical equations, the Hamiltonian function, the Hamilton-Jacobi 
equation, etc. The treatment given here does not rely upon the derivations 
of these concepts given in the main body of the book (see Secs. 16, 23), and in 
fact can be used to replace the previous derivations. The reader acquainted 


The authors would like to acknowledge discussions with M. L. Tsetlyn on the 
material presented here. 
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with optics will recognize that we are essentially constructing a mathe- 
matical model of the familiar Huygens’ principle.” 


1. Statement of the problem. Let the medium in which the disturbance 
propagates fill a space 2, which for simplicity we take to be n-dimensional 
Euclidean space. Thus, every point x eZ is specified by a set of n real 
numbers x!,..., x". Choosing a fixed point x» € Z, we consider the set of 
all smooth curves 


x = x(s) (1) 


passing through xo. The set of vectors tangent to the curve (1) at the point 
Xo, 1.e., the set of vectors 


forms an n-dimensional linear space, which we call the tangent space to & at 
x, and denote by Y(x,). Note that the end points of the vectors in any 
tangent space 7 (x) are points of Z itself.? 

Since the medium is inhomogeneous and anisotropic, the velocity of 
propagation of disturbances in 2 depends on position and direction, i.e., 
on x and x’. Let f(x, x’) denote the reciprocal of this velocity. Then, if 
x(s) and x(s + ds) are two neighboring points lying on some curve x = x(s), 
the time dt which it takes the disturbance to go from the point x(s) to the 
point x(s + ds) can be written in the form 


dx 
. = f(x, 2 ds, 


and the time it takes the disturbance to propagate along some infinite path 
joining the points x» = x(So) and x, = x(s,) equals 


[ : f(x | ie (2) 


Suppose the point x, is “‘excited,’’ and consider all possible paths joining 
Xo and x,. Then, because of the “‘off or on” character of the excitation, 
the only path which plays any role in the propagation process is the one along 
which the disturbance propagates in the smallest time, say t. (Disturbances 
arriving at x, via some other path which is traversed in a time >t will arrive 


?See eg., B. B. Baker and E. T. Copson, The Mathematical Theory of Huygens’ 
Principle, Oxford University Press, New York (1939). 

3 In the case considered, the tangent space 7 (x) is particularly simple, and in fact, 
is just an n-dimensional Euclidean space with origin at x. More generally, 2 can be an 
n-dimensional differentiable manifold, and then the end points of vectors in 7 (x) need 
no longer lie in 7. However, the analysis given below can easily be extended to this 
case, by exploiting the ‘‘local flatness”’ of :2’. 
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at x, “too late” to have any further effect on the propagation process, 
since x, will already be found in a state of excitation.) In other words, 


7 = min (ils =) ds, 
So 


where the minimum is taken with respect to all curves x = x(s) joining the 
points xX, and x,. Thus, the propagation of disturbances in the medium 
obeys the familiar Fermat principle (p. 34), i.e., among all paths joining xo 
and x,, the disturbance always propagates along the path which it traverses 
in the least time. We shall refer to such paths as the trajectories of the 


disturbance. 
Next, we state a physically plausible set of properties for the function 


f(x, x’): 
1. The propagation time along any curve is positive, and hence 
f(xy x)>0 if x #0. (3) 


2. The propagation time along any curve y joining x) and x,, given by 
the integral (2), depends only on y and not on how y is parameterized. 
It follows by the argument given in Chap. 2, Sec. 10 that f(x, x’) is 
positive-homogeneous of degree | in x’: 


T(x, Ax’) = f(x, x’) forevery A> 0. (4) 
In particular, (4) implies that 
f(x, x’ + ¥') = f(x, x’) + fx ¥), (5) 
if x’ = Ax’, where A > 0. 
3. The time it takes a disturbance to traverse a curve y connecting xo to x; 


is the same as the time it takes a disturbance to traverse y in the opposite 
direction from x, to Xo, and hence 
f(x, —x') = f(x, x’). (6) 
4, If the medium is homogeneous, so that fis a function of direction only, 
then the disturbance propagates in straight lines (see Prob. 1). In 
particular, no disturbance emanating from a given point x9 can arrive 
at another point x, more quickly by taking a path consisting of two 
straight line segments than by going along the straight line segment 
joining x) and x,. This implies the convexity condition 


Se + ¥) < f(%) + SE) 


(see Prob. 2). Ifjf depends on x in a sufficiently smooth way (e.g., if 
the derivatives 0//0x',..., 6f/@x" exist), the same argument shows that 
the convexity condition 


Lx + ¥) < f(% x) + Le *) (7) 
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holds for sufficiently small x’, x’, but then (7) holds for all x’, x’ because 
of the homogeneity property (4). 


5. Actually, we strengthen the condition (7) somewhat, by requiring that 
f satisfy the strict convexity condition, consisting of (7) plus the stipula- 
tion that (5) holds only if x’ = Ax’, where d > 0. 


Now suppose we have a disturbance which at time t = 0 occupies some 
region of excitation R in 2, and propagates further as time evolves. The 
boundary of R will be called the wave front. Let 

S(x, t) = 0 
be the equation of the wave front at the time #. Then our problem can 
be stated as follows: Find the equation satisfied by the function S(x, t) 
describing the wave front, and find the equations of the trajectories of the 
disturbance. 


2. Introduction of a norm in 7 (x). Our next step is to use the function 
f(x, x’) to introduce a norm in the n-dimensional tangent space 7(x). This 
can be done by defining the norm of the vector x’ = 0 to be zero and setting 


Ix’ =F *’+) (8) 


for all vectors x’ # Oin 7 (x). The fact that ||x’|| actually meets all the require- 
ments for a norm (see p. 6) is an immediate consequence of (3), (4), (6) 
and (7). The set of all vectors in 7 (x) such that 


F(x, x’) = |x’ = (9) 


is called a sphere of radius « in 7 (x), with center at the point x. The sphere 
(9) is just the boundary of the closed region of 7 (x) [and hence of 2°] which 
is excited during the time « by a disturbance originally concentrated at the 
point x. In this language, our problem can be rephrased as follows: 
Suppose a tangent space J (x), equipped with the norm (8) satisfying the strict 
convexity condition, is defined at each point x of an n-dimensional space &. 
Find the equations describing the propagation of disturbances in 2, if during 
the time dt the disturbance originally at x “spreads out and fills” the sphere 


I(x, dx) = dt. 


3. The conjugate space TF (x). Let g[x’] be a linear functional (see p. 8), 
defined on the tangent space .7(x). Then there is a unique vector 


P = (Pi, a > Pas 
such that 
p[x'] = (p, x’) 
for all x’ € 7 (x), where by (p, x’) is meant the scalar product 


n 
> PuxY boos + Pax 
t=1 
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(see Prob. 3). Conversely, any scalar product (p, x’) obviously defines a 
linear functional on 7 (x). The set of all linear functionals on .7(x), or 
equivalently the set of all vectors p, is itself an n-dimensional linear space, 
called the conjugate space of 7 (x) and denoted by TZ (x). We define the 
norm of a vector p € J (x) by the formula® 


Ip] = sup Fo? (10) 


where the least upper bound is taken over all vectors x’ # 0 in 7 (x) [see 
Prob. 4]. In the present context, we write H(x, p) instead of ||pll, i.e., 


H(x, p) = sup oo (11) 


It can be shown that the transition from the function f(x, x’) to the function 
H(x, p) defined by (11) is just the parametric form of the Legendre transfor- 
mation discussed in Sec. 18. 


4. The propagation process. Suppose the wave front at the time ¢ is the 
surface o,, with equation 
S(x, t) = 0. (12) 


We now examine in more detail the mechanism governing the evolution of a, 
in time. By hypothesis, each point of o serves as a source of new distur- 
bances, which during the time dt excite the region bounded by the sphere 


F(x, dx) Co dt. (13) 


Since the function f(x, x’) determining the propagation process is assumed to 
be differentiable and strictly convex (in the sense explained above), there is a 
unique hyperplane tangent to each point of the sphere (13), and this hyper- 
plane has only one point in common with the sphere. i.e., its point of tangency. 
If we construct a family of spheres (13), one for each point x €o,, then the 
wave front o,44; at the time ¢ + df, with equation 


S(x,t + dt) = 0, (14) 


is just the envelope E of this family of spheres. In fact, E is the ‘‘interface”’ 
separating the points of 2 which can be reached from og, in times <dt from 
the points which can only be reached from a, in times >df. This construction 
has two important implications: 


* The reader familiar with tensor analysis will note that here we make a distinction 
between contravariant vectors like x’, with components x'’ indexed by superscripts, 
and covariant vectors like p, with components p, indexed by subscripts. See e.g., 
G. E. Shilov, op. cit., Sec. 39. 

5 By sup is meant the least upper bound or supremum. 
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1. Given a point x €o,, there is a unique point x + dx € 6,4, which is 
excited after the time dt by a disturbance initially at x. In fact, x + dx 
is the point of o:44 lying on the (unique) hyperplane tangent to both 
(13) and a+; To see this, we observe that it takes a time >dtfora 
disturbance starting from x to reach any other point of o;,4;.6 Thus, 
there is a unique direction of propagation defined at each point x €o,, 
and it is clear that a disturbance leaving x in this direction will arrive at 
the surface o,,4, more quickly than a disturbance leaving x in any other 
direction, as required by Fermat’s principle. 


2. Conversely, given a point x + dx €o,,,,, there is a unique point 
x €o;, which at the time ¢ was the source of the disturbance reaching 
x + dx atthetimez+ dt. In fact, x is just the center of the (unique) 
sphere of radius dt which shares a tangent hyperplane with o; . a. 


5. The Hamilton-Jacobi equation. As was just shown, every hyperplane 
tangent to the surface o;,4, with equation (14) must also be tangent to some 
sphere of radius dt whose center lies on the surface co, with equation (12). 
This fact can be used to derive a differential equation satisfied by the function 
S(x, t). First, we observe that every hyperplane in the tangent space 7 (x) 
can be written in the form 


px" = const, 


M: 


i=1 


where p = (pi,..., Pp) is a vector in the conjugate space FT (x). Let x + dx 
be an arbitrary point of o,,4,, whose “‘source” is the point x € o,. Then 
the hyperplane in 7 (x) tangent to o:44, at x + dx has the equation 


— OS 
> gp dx = 6 (15) 
where c is a constant. If the hyperplane (15) is also tangent to the sphere 


(13), as required, then c equals the norm of the vector 


os os 
VS = (= ass) a) 
multiplied by the radius of the sphere, i.e., 


c = H(x, VS) dt. 
Therefore, (15) becomes 


> & axt = Hx, VS) dt. (16) 
iG Ox 


® Physically, this means that if the surface o; is changed only in a small neighborhood 
of the point x, the surface o,.4, is also changed only in a small neighborhood of x + dx. 
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But 


S dt +2 dt =0, (17) 


because of the meaning of x and x + dx. Comparing (16) and (17), we 
finally obtain 


S + H(x, VS) = 0. (18) 
This equation describes the way the wave front evolves in time, and is just 
the familiar Hamilton-Jacobi equation, already considered in Sec. 23. 

We now show the relation between the trajectories of the disturbance and 
the general solution of (18). It will be recalled that as a wave front evolves 
in time, each of its points goes into a succession of uniquely defined points 
lying on neighboring wave fronts, thereby “‘sweeping out” a trajectory y 
which automatically minimizes the functional (2). Thus, if we specify a 
one-parameter family of wave fronts 


S(x, t) = 0, (19) 


where the parameter is the time t, every point x) on some “initial” surface 
S(x, to) generates a trajectory. Choosing the point Xo arbitrarily, we find 
that the one-parameter family of surfaces (19) determines an (n — 1)- 
parameter family of trajectories, such that one and only one trajectory of the 
family passes through each point xe. More generally, let 


S(x, t, Hy sey On) 


be a complete integral of the Hamilton-Jacobi equation depending on n 
parameters «,,...,a,. This complete integral determines an (n + 1)- 
parameter family of surfaces’ 


SQ, t, 3 &y) = 0, (20) 


which in turn determines a (2n — 1)-parameter family of trajectories. Then 
the fact that the trajectories of the disturbances are the extremals of the 
functional (2) leads to a geometric interpretation of Jacobi’s theorem (p. 91), 
concerning the construction of a general solution of the system of Euler 
equations of a functional from a complete integral of the corresponding 
Hamilton-Jacobi equation.® 


7 Since S(x, tf + fo, 01,...,%n) = 0 is also an integral surface of the Hamilton-Jacobi 
equation forarbitrary fo, the family of surfaces (20) actually depends on » + | parameters, 

8 Jt should be noted that we are considering a parametric problem, so that there 
is dependence between the Euler equations (see Sec. 10 and Remark 4 of Sec. 37). As 
a result, the general solution of the 2” equations obtained here contains only 2” — 1 
arbitrary constants. 
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6. The canonical equations. To derive the differential equations satisfied 
by the trajectories of the disturbance, we might use Fermat’s principle, 
minimizing the functional (2) and solving the corresponding Euler equations. 
However, we prefer to use our geometric model of the propagation process. 
If we introduce the time ¢ as the parameter along each trajectory, it follows 
from 


T(x, dx) = dt 
and the homogeneity of f(x, dx) in the argument dx that 


f(x, &) =1, (21) 


i.e., the norm of the vector dx/dr is identically equal to |. Using (16), we 
find that at each point x, the vector dx/dt (tangent to the trajectory along 
which the disturbance propagates) is related to the covariant vector p 
(determining the hyperplane tangent to the wave front) by the formula 


> woe =; = H(x, p). 


According to (21) and the definition (11) of the norm of vectors in FT (x), 
we see that 


2, a H(x, p) 


if p is any other vector in F (x). Thus, the expression 


2 - — H(x, p), 


regarded as a function of p, achieves its maximum when p is the vector 
determining the hyperplane tangent to the wave front. Therefore, along 
the trajectories, the conditions 


afs_ dx' 
we LDP ar ~ Hosa) =0 G@ =1,...,” 


must hold, i.e., 
dx! _ 0H(x, p) 


Gp oe. Gi =1,...,n). (22) 


We have just obtained a system of ordinary differential equations of the 
first order satisfied by the trajectories. Since these equations involve 2n 
unknown functions x1,..., x" and p,,..., P,, we Still need n more equations 
to completely describe the trajectories. To find the missing equations, 
we use the fact that the surfaces representing the wave fronts at different times 
are not arbitrary, but satisfy the Hamilton-Jacobi equation (18), while the 
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values p, at each point of a trajectory are the components 05/éx! determining 
the hyperplane tangent to the wave front. In other words, 


P= PAD) = SEOs. "OA 


along each trajectory, and hence 


dp, d as 00s <= 025 dx* 
ae Gok aot o. omee ae (23) 
We now introduce the following notation: If the function H(x, p), where 
Py = 0S/0x,, is regarded as a function of x1,...,x" and t, we indicate its 
partial derivative with respect to x! by 
oH 
ox! icant 
whereas if H(x, p) is regarded as a function of the 2n variables x!,..., x" 
and p,,..., Pz, We indicate its partial derivative with respect to x' by 
oH 
Ox! |p =const 


Then, using the Hamilton-Jacobi equation (18), we can write (23) in the form 


dp, aH 2 aS dxt 
dee Nees ote 2 Ox ox dt (24) 
Along the trajectories, we have 
0H oH 8 OP 
éx' t=const ox" p=const K=1 OP;. |z=const OX ( ) 
and 
os dx* 0H 
Pe Ox dt = Ope ey 


Substituting (25) and (26) into (24), we obtain n differential equations 


dp, __ oH 
dt ‘ox! 


(i = 1,...,n). 
jp = const 
Combining these equations with (22), we obtain a system of 2n differential 
equations 
dx' _ 0H (x, p) 


dtp, (27) 
dp, ae OH (x, P), 
dt ox! 

where i = 1,...,”. The integral curves of (27) are the trajectories along 


which the disturbance propagates, i.e., the extremals of the functional (2). 
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The system (27) is of course the canonical system of Euler equations for the 
variational problem associated with (2) [cf. Sec. 16], and represents the so- 
called characteristic system associated with the Hamilton-Jacobi equation 
(18) [cf. p. 90]. 


PROBLEMS 


1. Prove that if f(x, x’) depends on direction only, then the disturbance 
propagates through the medium along straight lines. 


2. Prove that if f(x, x’) = f(x’) is independent of x, then f(x’) is precisely 
the time required to traverse the vector x’. 


3. Prove that every linear functional p[x) defined on an »-dimensional 
Euclidean space of points x = (x!,..., x") is of the form 


plx] = pixt + +++ + pax’, 
where p = (pi,..., Pn) is uniquely determined by 9. 
4. Verify that formula (10) actually defines a norm for the elements p of the 
conjugate space 7 (x). 


5. Why is the strict convexity condition (p.2!1) needed in constructing 
wave fronts for the disturbance? 


Appendix Il 


VARIATIONAL METHODS 
IN PROBLEMS OF 
OPTIMAL CONTROL 


In this appendix, we sketch some results obtained by L. S. Pontryagin 
and his students, in their investigations of the theory of optimal control 
processes.. The connection between this subject and classical variational 
theory will also be discussed. 


1. Statement of the problem. In many cases, finding the optimal “ operating 
regime” for a physical system (with a suitable optimality criterion) leads 
to the following mathematical problem: Suppose the state of the physical 
system is characterized by n real numbers x’,..., x", forming a vector 
x = (x},..., x") in the n-dimensional “phase space’ 2 of the system, 
and suppose the state varies with time in the way described by the system 
of differential equations 


Oo Pau) (j= 1,..., n). (1) 
Here, the & real numbers uw}, ..., u* form a vector u = (u',..., u*) belonging 


to some fixed “control region” (, which we take to be a subset of 


‘See L. S. Pontryagin, Optimal control processes, Usp. Mat. Nauk, 14, no. 1, 3 (1959); 
Vv. G. Boltyanski, R. V. Gamkrelidze and L. S. Pontryagin, The theory of optimal 
processes, 1, The maximum principle, !zv. Akad. Nauk SSSR, Ser. Mat., 24, 3 (1960); 
L. S. Pontryagin, V. G. Boltyanski, R. V. Gamkrelidze and E. F. Mishchenko, The 
Mathematical Theory of Optimal Processes, translated and edited by K. N. Trirogoff and 
L. W. Neustadt, Interscience Publishers, New York (1962). The more general case 
where Q is a topological space is considered in the first two references. 
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k-dimensional Euclidean space, and the {‘(x, uw) are n continuous functions 
defined for all x €% and all ue Q. 

Now suppose we specify a vector function u(t), fp < t < t,, called the 
control function, with values in Q. Then, substituting u = u(t) in (1), we 
obtain the system of differential equations 

dx! if y1 n 1 ke ; 
HAS rw, uO] = 1-0). (2) 
For every initial value x9 = x(fo), this system has a definite solution, called 
a trajectory. The aggregate 


U = {u(t), to, ti, Xo}; (3) 


consisting of a control function u(t), an interval [f, t,;] and an initial value 
Xo = X(to), will be called a control process. Thus, to every control process, 
there corresponds a trajectory, i.e., a solution of (2). 
Next, let 
fx}. .., xX", u,..., u*) 


be a function which is defined, together with its partial derivatives 


of° 


“Ox! 


forall xe ZandueQ. To every control process U, we assign the number 


G = 1,...,2), 


JU) = f° Fx a, (4) 


i.e., J[U] is a functional defined on the set of control processes. Then, 
the control process (3) is said to be optimal if the inequality 


J[U] < J[U*] 


holds for any other control process U* carrying the given point x, into the 
point x, i.e., such that the corresponding trajectory x*(t) satisfies the con- 
dition x*(t*) = x,. By the optimal trajectory, we mean the trajectory 
corresponding to the optimal control process. Our aim is to find necessary 
conditions characterizing optimal control processes and optimal trajectories. 

It should be pointed out that in calling a control process optimal, it is 
assumed that some class of admissible control processes has been specified in 
advance. Here, we assume that the components w(t),..., u*(t) of any 
admissible control process take values in Q, and are bounded and piecewise 
continuous (with left-hand and right-hand limits at every point of dis- 
continuity). 

An important special case of the problem of optimal control is the situation 
where the functional (4) reduces to the integral 


t 
dt, 


to 
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representing the time it takes to go from the point xo to the point x;. 
In this case, optimality means taking the least time to go from x9 to x;. 


2. Relation to the calculus of variations. The problem of optimal control 
is intimately related to certain traditional problems of the calculus of 
variations. In fact, the integral 


is f(x, u) dt 


can be regarded as a functional depending on n + k functions x},..., x", 
u},..., u*, i.e., as a functional defined on some class of curves inn + k + 1 
dimensions. Since the functions x?,..., x", u’,..., u* are connected by the 
equations (1), we are dealing with the problem of finding a minimum subject 
to nonholonomic constraints (see p. 48). Since the boundary conditions are 
equivalent to the requirement that the desired optimal trajectory x(t) begin 
at the point x, and end at the point x,, the end points of the admissible curves 
in our (n + k + 1)-dimensional space have to lie on two (k + 1)-dimensional 
hyperplanes, determined by giving the coordinates x’, ..., x” the fixed values 
x},..., xR and xt,..., x7. 

Thus, we see that the problem of optimal control is a variant of the problem 
of finding a minimum subject to subsidiary conditions. The problem of 
optimal control has the special feature that we specify in advance a definite 
class of admissible control processes, where the functions u'(f),..., u*(f) 
are required to take values in a given fixed region Q, but in general are not 
required to be continuous. 

We can easily show that the simplest n-dimensional variational problem, 
where the integrand does not depend on ¢ explicitly,” is a special case of the 
problem of optimal control. To this end, suppose that among the curves 
passing through two fixed points 


(Xopeete XO © Oe XQ); 
it is required to find the curve for which the functional 
fy dx} dx" 
0 1 Mn oe ——— 
PB BY Gis GE) et (5) 


has a minimum. To paraphrase this problem as a problem of optimal 
control, we need only write (5) in the form 


t 
i ater a u,..., u®) dt, 
to 


and take the system (1) to be simply 
dx’ i . 
aoe @ =1,...,7). 


2 This condition is not really a restriction, since any functional can be transformed 
into this form, e.g., by going over to the parametric form of the problem. 
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3. Necessary conditions for optimality. To find necessary conditions for 
a given control process and the corresponding trajectory to be optimal, we 
supplement the system of equations 


dx' : oak 
Hh LO) G@=1,...,”) 


with the extra equation 

dx°® A 

a = f(x, u), 
where f°(x, u) is the integrand of the functional (4) which is to be minimized. 
At the same time, we supplement the initial conditions 


X'(to) = x4 G@ = 1,...,”) (6) 
with the extra condition 
X%(fo) = 0. (7) 
For convenience, we introduce the (n + 1)-dimensional vector function 
x(t) = (xt), x(1) = (x°(1), x(t), ..., x7(0)). 
It is clear that if U is an admissible control process and if x = x(t) is the 
solution of the system 


O° = f(x, u) (= 0,1,...,7), (8) 
corresponding to U and the initial conditions (6) and (7), then 
ty 
JtU] = ffx, u) dt = x%(t,). 
to 


Thus, the problem of optimal control can be stated as follows: Find the 
admissible control process U for which the solution x(t) of the system (8), 
satisfying the initial conditions (6) and (7), has the smallest possible value of 
x(t). 

Next, in addition to the variables x°, x!,..., x", we introduce new variables 
Vo, Yi1,---, Y, Satisfying the following system of differential equations, known 
as the conjugate* of the system (8): 

nN fa 
Hee C20. A: (9) 

3 Note that the functions f*, and hence the functions II and A defined below, do not 
involve x°(r). 

4 This system has the following geometric interpretation: In the space of vectors 
(Uo, Yi, -.., Yn) Conjugate to the space of vectors (x°, x’,..., x") [see p. 211], consider 
the hyperplane 


n 
> Uox. = ¢ = const 
a=0 


passing through the initial point (0, xj,..., 3). Then the system (9) describes the 
‘transport’ of this hyperplane along the trajectories corresponding to solutions of the 
system (8). In other words, if the 1), satisfy (9) and the x‘ satisfy (9) for fp < ¢ < ht, then 


n 
D text Se Ue ere). 
a=0 


For more details, see the second of the references ciled on p. 218. 
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Let 
p(t) = (Yo(t), dle), spinces (2), 


and consider the following function of the variables x’, ..., x", Yo, dy, -- +> Vins 
Uy,..., Ug! 


Tp, x, ¥) = > dof u). (10) 
a=0 
In terms of IT, we can write the equations (8) and (9) in the form 
dx‘ all 
dt (Oh, 
(il) 
dy _ _ all 
dt sox” 


where = 0,1,..., 7. The equations (11) remind us of the canonical system 
of Euler equations [see formula (11), p. 70]. However, they have a different 
meaning, since the canonical equations form a closed system, in which the 
number of equations equals the number of unknown functions, whereas (11) 
involves not only x and w but also the unknown function u, and hence (10) 
becomes a closed system only when u is specified. In fact, in order to write 
equations for the optimal control problem resembling the canonical equations, 
we would have to use the function 


H(H, x) = sup II(, x, 4), (12) 
instead of the function IT(, x, u).5 


4. The maximum principle. We can now state the following theorem, 
whose proof can be found in the references cited on p. 218: 


THEOREM (The maximum principle). Let U = {u(t), to, t1, Xo} be an 
admissible control process, and let x(t) be the corresponding integral curve 
of the system (8) passing through the point (0, x,..., x8) for t = 0, and 
satisfying the conditions 


x1(t;) = x},..., x4) = xt 


for t = t,. Then if the control process U is optimal, there exists a con- 
tinuous vector function Y(t) = (Yo(t), Yilt), ..-, Palt)) such that 


1. The function p(t) satisfies the system (9) for x = x(t), u = u(t); 


5 The transition from II to # is analogous to the Legendre transformation, considered 
in Sec. 18. 
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2. For all t in [to, t,], the function (10) achieves its maximum for 
u = u(t), ie., 


I [p(z), x(2), u(t)] = # (p(s), x(2)], (13) 
where the function # is defined by (12); 


3. The relations 
bol(t:) < 0, A [p(ts), u(t,)] = 0 (14) 


hold at the time t,. Actually, if p(t), x(t) and u(t) satisfy the system 
(8), (9) and the condition (13), the functions V(t) and #[p(t), x(t)] 
turn out to be constants, and hence in (14) we can replace t, by any 
value of t in [to, ty]. 


Remark 1, The maximum principle can often be used as a prescription for 
constructing the optimal trajectory, in the following way: For every fixed 
and x, we find the value of u for which the expression 


>, baf%(x, u) 
a=0 
takes its maximum. If this determines u as a single-valued function 


u = u(y, x) (15) 


of and x, then, substituting (15) into the equations (8) and (9), we obtain 
a closed system of 2(n + 1) equations involving 2(n + 1) unknown functions. 
These are just the equations which have to be satisfied by the optimal 
trajectory. 


Remark 2. For the simple n-dimensional variational problem discussed 
on p. 220, the system (8), (9), or the equivalent system (11), together with the 
maximum principle, reduces to the usual system of Euler equations. To see 
this, consider the functional 


emer u',...,u") dt (16) 
to 
[cf. (5)], where 
dx' 
fe co 
Co, (= 1,...,7). (17) 
In this case, the function (10) is 
T(p, x, v) = Go f(x, u) + > dau’, (18) 
a=1 
and the system (11) becomes 
dx° nn a 
AG 7S) a” 
die _ ayy _ _y, nw) 
dt’ dt ox! 
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where i = 1,...,”. Maximizing II(, x, uw), we find that 


all, af (x, u) _ 
aut = Yo aye = 8, 


i= =p LO (i =1,...,”). 


Since d),/dt = 0, we have }) = const, and hence 


d [afr (x, u)] _ f(x, uw) 
dt a ~ ext” 

dx' 

at =u. 

This is just the system of Euler equations corresponding to the functional 
(16), reduced to a system of first-order differential equations by introducing 
the derivatives dx'/dt = u‘ as new functions (cf. p. 68). 


Remark 3. In Appendix 1, we have already encountered the fact that every 
propagation process can be described in two ways, either in terms of the 
trajectories along which the disturbance propagates (the “‘rays”’ in optics), 
or in terms of the motion of the wave front. The first approach leads to the 
canonical Euler equations (or, as in the example just considered, to the 
usual form of the Euler equations), i.e., a system of ordinary differential 
equations. The second approach leads to the Hamilton-Jacobi equation, 
i.e., a partial differential equation. Our maximum principle involves the 
study of trajectories, and in this sense is analogous to the method of canonical 
equations. The “wave front approach” to problems of optimal control 
has been developed by R. Bellman.® 


5. Relation to Weierstrass’ necessary condition. We again consider the 
simple functional (16), (17), where the function IT(«p, x, u) is given by (18). 
Using (17), we can also write the functional (16) in the form 


ty 
| fC, 25 x", x, x”) dt. (19) 
to 


The Weierstrass £-function for such a functional is’ 


E(x, x',2) = 1% 2) — P52) — DMPO). 20) 


® See the relevant references cited in the Bibliography, p. 227. 
7See p. 146. Note that E is a function of three rather than four arguments, since (19 
is independent of ¢. 
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Using (18) and (20), we find that 
Ti(sp, x, z) — TI(, x, x’) — > (z, — x" = Il(ep, x, x’) 


Cr UN Eer oe ») Ve eee > (2. — x"\bof" + Wy) 


i= 


= dof (x, 2) — Pof (x, x’) — > (2, — x" oft = Yok(x, x’, z). (21) 


If the function I] achieves its maximum for values of u = x’ which are 
interior points of the region Q, then 


aml _ 
out 
at these points. Then, since }, < 0, it follows from (21) that the condition 
(13) is equivalent to the condition 


E(x, x’, z) > 0. (22) 
This is Weierstrass’ necessary condition, with which we are already familiar 


(see p. 149). Thus, the maximum principle leads to another, independent 
derivation of (22). It can be shown that the formula 


boE = Nb, x,2)- Mx) — > — OST,» x) 


remains true for variational problems subject to constraints, i.e., for more 
general problems of optimal control. 

We have just proved the equivalence of the maximum principle and 
Weierstrass’ necessary condition (22) in the case where the set (2 of admissible 
values of the control function u(t) is open, i.e., where every point of Q is an 
interior point. In the case where the optimal control process involves values 
of u(t) lying on the boundary of the region Q, the condition (22) is in general 
no longer valid. However, it can be shown that in such cases, the maximum 
principle continues to apply. 


PROBLEMS 


1. State the maximum principle (p. 222) for the problem of ‘‘fastest motion” 
or ‘“‘time optimal problem,”’ where the functional (4) reduces to simply 


J{U] = ire dt. 


Ans. In this case, we write 


P(, x, 0) = > ba f(x, W) 


a=1 
instead of (10), and in the system (11), f need only range from | to”. The 
function.“ in the maximum principle is now replaced by 


H(d, x) = sup PY, x, i) = Ah, x) — vo. 
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Finally, the relations (14) are replaced by 


Al(n), x(1)] = —bo = 0, 
which actually holds for any ¢ in [f, f,]. 


2. Consider the differential equation 

dx 

qe uy, (a) 
where the control function u obeys the condition |u| < 1. Introducing the 
“phase coordinates” x! and x?, we can write (a) as a system 


1 
dx * dx? Ty (b) 


eS phe 
dt : dt 
What trajectory corresponds to the fastest motion from a given initial point 


Xo to the final point x, = (0, 0)? 


Hint. The auxiliary variables }, and {2 obey the equations 


By the maximum principle (modified in accordance with Prob. 1), 
u(t) = sgn Y(t) = sgn (ce — cif), 


where c, and ce are constants, sgn x = x/|x| and u(r) can only change sign 
once. Integrate the system (b) for u = +1, and draw the corresponding 
families of parabolas in the (x, x?) plane, analyzing the various possibilities 
(corresponding to different initial positions xo). 


3. Study the same “‘time-optimal problem” for the equation 


d?x i» 
qe tx jul <1. 
Hint, The appropriate system is now 
ee x? cae —-xi+u 
dt J dt ; 


4. Study the same “‘time-optimal problem”’ for the system 


dx? a dx? 
Ss H= r+ => = -x1 4+ 0? 
dt 6 dt : 
where there are fwo control functions u!, u? obeying the conditions || < 1, 


ju?| <1, 


Comment. For a detailed discussion of Probs. 2-4, see Chap. 1, Sec 5 
of the book cited on p. 218. 


5. Verify the relations (14) for the simple variational problem (16) discussed 
in Remark 2, p. 223. 


Hint. Use Euler’s theorem on positive-homogeneous functions (Chap. 2, 
Prob. 6). 
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